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CREATION OF VARIABLE LENGTH AND SEQUENCE LINKER 
REGIONS FOR DUAL-DOMAIN OR MULTI-DOMAIN 

MOLECULES 

FIELD OF THE INVENTION 

This invention in the field of molecular biology relates to libraries of dual- 
domain nucleic acids and/or proteins in which the domains are joined by a library of 
linkers that vary in length and sequence, 

BACKGROUND OF THE INVENTION 

Dual-domain polypeptides or dual-domain nucleic acids encoding such 
polypeptides may have new, advantageous properties compared to the original 
polypeptides or nucleic acids after which they are patterned. Such polypeptide domains 
are generally linked using a linker region or linker domain. A generic designation of 
such a polypeptide construct is D1-L-D2, wherein Di and D2 are two structural domains 
that are identical or different and L is the linker. For example, two cytosolic domains of 
the membrane-spanning protein adenylyl cyclase coupled with a linker domain form a 
soluble protein (Tang et ah. Science, 268: 1769-1772 (1995)). An advantage of this 
soluble form of adenylyl cyclase, which retains enzymatic activity, is that it can be 
produced in much higher quantities than the native enzyme (Dessauer et ai, J, Biol 
Chem., 16967-16974 (1996)). 

Another type of polypeptide generated by linking two domains is a single chain 
antibody or scFv. These single chain polypeptides include the variable (V) regions from 
the heavy (H) and light(L) chains of a selected immunoglobulin (Ig) and recreate the 
antigen binding site of the native Ig while being a fraction of its size (Skerra, A. et al 
(1988) Science, 240: 1038-1041; Pluckthun, A. et al (1989) Methods Enzymol 178: 
497-515; Winter, G. et al (1991) Nature, 349: 293-299); Bird et al (1988) Science 
242:423; Huston et al (1988) Proc. Natl Acad. Set USA 85:5879; U.S. Patents No. 
4,704,692, 4,853,871, 4,946,778, 5,260,203, 5,455,030. A number of U.S. patents and 
international patent publications of J. Huston and colleagues describe various two chain 
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or two domain proteins, including single chain antibodies, joined by linker peptides and 
optionally including cleavable sites (U.S. Patents No. 5888773, 5877305, 5861 156, 
5837846, 5753204, 5534254, 5525491, 5482858, 5476786, 5330902, 5302526, 
5258498, 5132405, 5091513, 5013653, WO 9323537A1 (25-NOV-1993) 

An scFv is composed of a Vh domain at its N-terminus and a Vl domain at its C- 
terminus (or vice versa) linked by a peptide linker. Correct folding of the Vh and Vl 
regions is crucial for retention of antigen binding capacity by the scFv. The length and 
sequence of the linker region are critical parameters for correct folding and biological 
function. scFv chains are easier to express than the larger Fv fragments or even larger 
Ig molecules (which are four chain complexes). 

A ribozyme is a catalytic RNA molecule that cleaves other RNA molecules that 
contain nucleic acid sequences complementary to particular targeting sequences in the 
ribozyme. Two identical or different nucleic acid domains such as two ribozyme 
domains can be joined to create a bifunctional ribozjme that can act on more than one 
RNA substrate structure. General methods for constructing ribozymes, including 
hairpin ribozymes, hammerhead ribozymes and RNAse P ribozymes are known in the 
art. Castanotto et al (1994) Advances in Pharmacology, 25: 289-317, reviews 
ribozymes (including group I, hammerhead, axhead ,hairpin and RNAse P). Ribozymes 
that can advantageously target desired specific sequences, such as HIV sequences, have 
been described (Ho, A. et al, WO 9426877 (1994); Yu et al (1993) Proc. Natl Acad, 
USA, 90:6340-6344, and Dropulic a/. (1992) J. F/ro/., 66:1432-1441). 

The hammerhead riboz3mie and the hairpin ribozyme are catalytic molecules 
with antisense and endoribonucleotidase activity. Their intracellular expression can 
confer significant resistance to, for example, HIV infection. Hammer head ribozymes 
are described in Rossie et al (1991) Pharmaa ner., 50:245-254; Forster et al (1987) 
Cell 48:211-220; Uhlenbeck, OC {19%!) Nature, 328:596-600; Haseloff, J. et al (1988) 
Nature, 334:334:585; Dropuhc et al, supra; and Castanotto et al, supra, and references 
cited therein. Hairpin ribozyme are disclosed in Hampel et al (1990) Nucl Acids Res,, 
18:299-304; Hampel et al, EP 0360257 (1990); Haseloff, J.P. et al, US 5,254,678 
(1993); Kraus, G. et al, US 5,958,768 (1999); Ho, A. et al, WO 9426877 (1994); 
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Ojwang et al. (1992) Proc. Natl. Acad. USA, 89: 10802-10806; Yamada et al. (1994) 
Gene Therapy 1: 39-45; Leavitt etal {1995) Proc. Natl. Acad. USA, 92: 699-703; 
Leavitt etal., Human Gene Therapy, 5: 1151-1120; and Yamada et al. (1994) Virology, 
205: 121-126). 

For convenience, the conventional single letter nucleotide code to designate 
positions wherein more than one base may be present is provided in Table 1. 



TABLE 1 





For RNA 


For DNA 




r 


g or a 


g or a 


(purine) 


y 


u or c 


t or c 


(pyrimidine) 


s 


g or c 


g or c 




w 


a or u 


a or t 




V 


a, g or c 


a, g or c 




X 


c, u, or a 


c, t, or a 




n 


= a ,g, c, or u 


a ,g, c, or t 





(Obviously, in an r:y pairing, if r=g then y=c, etc.) 



The typical substrate sequence for hairpin ribozymes is 
nnng/ cn*gucnnnnnnnn (where n*g is the cleavage site). The hammerhead 
ribozyme cleaves at any nux sequence. Thus, the same substrate target within the 
hairpin leader sequence, guc, is targetable by the hammerhead ribozyme. 

Two DNA domains can be also linked to form a dual-domain DNA molecule. 
Certain DNA domains bind to proteins such as DNA polymerases, endonucleases, and 
transcription factors. Thus, two linked DNA domains can be linked to form a dual- 
domain DNA molecule that binds one or more DNA binding protein. 

Those skilled in the art will know of the existence of other nucleic acid or 
polypeptide domains that may be advantageously Hnked to form a dual-domain nucleic 
acid or polypeptide with one or more functions. Those of skill will also recognize the 
general desirability of methods that yield such products. 

The desired property of a dual-domain DNA, ribozyme or protein molecule can 
be optimized by modifying the nucleic acid that (1) constitutes the DNA domain, (2) 
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encodes the ribozyme sequence or (3) encodes the protein domain. This is achieved 
through a variety of conventional techniques. In one approach, the sequence or length 
of the Unker region is varied in an effort to optimize the dual-domain molecule. The 
length and sequence of the linker region may indeed be critical to the function of a dual- 
domain protein. 

Methods for generating a scFv dual-domain protein with Hnkers of varying 
peptide length are known in the art {e.g., U.S. 5,837,242). Changes in sequence or 
length of the linker can adversely affect the stability, protease susceptibility, binding 
activity and expression levels of the scFv. Because, the effect of a change in linker 
sequence or length on the function(s) of the dual-domain polypeptide has been generally 
unpredictable, the effect on bioactivity of varying particular amino acid residues in the 
linker or changing its overall length generally cannot be determined a priori. 

There is thus a need for methods that permit creation of a nucleic acid library 
that encodes D1-L-D2 (or higher order) structures wherein L has random length and 
sequence. The dual-domain protein can be expressed from the library and the properties 
of interest can be analyzed. Once a protein is identified as having "optimal" properties, 
its sequence can be determined by resolving the nucleotide sequence of the clone that 
encodes that protein. This approach obviates the necessity of creating and testing 
individual clones until finding one with the desired property. 

The polymerase chain reaction (PGR) has been used to generate libraries of 
nucleic acid products that have two domains connected by a linker having different 
sequences or different lengths. No currently available method permits simultaneous 
introduction of both random length and random sequence into the linker region of a 
population of nucleic acids. 

Expression Systems 

Many expression systems for heterologous proteins are known in the art. These 
include bacterial systems which have the advantages of rapid and abundant production, 
but are limited in many instances by their inability to produce properly folded and 
soluble proteins (unless the proteins are subjected to cycles of denaturation and 
renaturation). Baculovirus systems drive expression through the secretory pathways of 
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insect cells, thereby increasing the probability of improved protein solubility 
(Kretzschmar, T. et al. (1996) J. Immunol. Methods 195:93-101; Brocks, B. et al 
(1997), Immunotechnology 3:173-184). Because manipulating the vims and growing 
insect cells can be time consuming and costly, the system is less suitable for expression 
of certain types of proteins, for example tumor-specific or individual-specific proteins 
such as idiotypic scFv polypeptides. There is therefore a need in the art for suitable 
rapid and economical expression systems to produce useful dual-domain proteins, one 
example of which is an idiotypic scFv vaccine for treating B-cell lymphoma. The 
present invention addresses this need. 

SUMMARY OF THE INVENTION 

The present invention inventors have conceived of an approach for generating a 
library of dual-domain or multi-domain (>2) polypeptides fi-om appropriate coding 
nucleic acids, which library is characterized by the members having random linkers 
Hnking each pair of polypeptide domains, wherein the random linkers have variable 
length and sequence. The nucleotide sequences encoding the linkers comprise a 
repeated pattern of degenerate tiiplet bases. The first and second (and/or higher order) 
domains may be the same or different from one another. The amino acid composition of 
an entire linker region may include between 1 and about 20 different amino acids with 
each repeated pattern of degenerate triplet bases encoding between 1 and about 12 
different amino acids. The preferred linker length ranges from 1 to 50 amino acids. In 
one embodiment, the polypeptide is a single chain immunoglobulin or single chain 
antibody (scFv) molecule wherein one domain is an immunoglobulin Vh domain and 
the other domain is an immunoglobulin Vl domain. 

More specifically, the present invention is directed to a library of dual-domain 
nucleic acid molecules each of which has (a) a first and a second domain; (b) separating 
and linking the domains, a linker which is a member of a randomized library of linkers 
that (i) vary in size and nucleotide sequence, (ii) consist of a repeated pattern of 
degenerate repeated triplet nucleotides. 
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In the above library, the repeated pattern of degenerate repeated triplet 
nucleotides of the linkers have the follovv^ing properties: 

(i) position 1 of each repeated triplet cannot be the same nucleotide as position 2 of 
the repeated triplet; or 

(ii) position 2 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet; or 

(iii) position 1 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet. 

Preferably, the nucleotide in the first and second positions of each repeated 
triplet is selected from any two of deoxyadenosine, deoxyguanosine, deoxycytidine or 
deoxythymidine. In another embodiment, (i) position 1 of each repeated triplet is 
deoxyadenosine or deoxyguanosine; (ii) position 2 of each repeated triplet is 
deoxycytidine or deoxyguanosine; and (iii) position 3 of each repeated triplet is 
deoxythymidine. 

In another embodiment, two different repeated patterns of degenerate triplet 
bases are combined to generate a population of linkers used to produce dual-domain 
molecules. The combination of different repeated patterns of degenerate triplet bases is 
used to increase the complexity of the linker sequences obtained from the population. 
The different repeats can also be used to introduce differing structural or biochemical 
properties to the linker region. For example, degenerate triplet vwc and degenerate 
triplet nvt are used as the nontemplated sequence. In this example, the degenerate 
linker sequence is (vwc) ^ (nvt) y where x= 1 to 20 and y=l to 20. This combination 
would produce hnkers containing different combinations of amino acids within each 
repeat as well as differing length of linkers. 

In one embodiment of the above library, at least one of the domains binds to a 
protein. In another embodiment, both of the domains bind to a protein. 

In yet another embodiment, at least one, preferably both, of the domains binds to 
a nucleic acid that is not a member of the library. 

In any of the above nucleic acid libraries, the first and the second domains are 
preferably coding sequences. 
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The library, as described above, is preferably produced in plants or plant cells. 

The present invention also provides a dual-domain or multi-domain nucleic acid 
molecule selected out from the library described above. 

Also provide is a library of dual-domain polypeptide molecules each of which is 
described by the formula Di-L -D2 (going from N-terminus to C-terminus) wherein 

(a) Di and D2 are polypeptide domains and 

(b) L is a peptide or polypeptide linker which is a member of a randomized 
library of linkers that vary in size and sequence, which Ubrary is encoded 
by nucleic acid sequences consisting of a repeated pattern of degenerate 
repeated triplet nucleotides. 

In a preferred embodiment, the present invention is directed to a library of multi- 
domain polypeptide molecules each of which comprises polypeptide domains D, each 
pair of D's being linked by a peptide or polypeptide linker L, such that each molecule is 
described by the formula 

DxLy 

wherein x is an integer between 2 and about n, wherein n is preferably about 20, y is an 
integer between 1 and (n-1), with the proviso that for any value of x, y is preferably x-1; 
Di is bonded to a single C-terminal linker; Dn (the "ultimate" C-terminal domain) is 
bonded to a single N-terminal hnker; each of D2 to Dn-i are bonded to a N-terminal and 
a C-terminal linker; each L is a member of a randomized library of linkers that vary in 
size and sequence, which hnker library is encoded by nucleic acid sequences consisting 
of a repeated pattern of degenerate repeated triplet nucleotides. 

A preferred library is a library of dual-domain polypeptide molecules each of 
which is described by the formula Di-L -D2 wherein 

(a) Di and D2 are polypeptide domains and 

(b) L is a peptide or polypeptide linker which is a member of a randomized 
library of linkers that vary in size and sequence, which library is encoded 
by nucleic acid sequences consisting of a repeated pattern of degenerate 
repeated triplet nucleotides. 
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In the above libraries of dual- or multi-domain polypeptide molecules, each 
linker in the library preferably (i) has a length of between about 1 and 50 amino acid 
residues and (ii) consists of between 1 and about 20 different amino acids and (iii) each 
repeated pattern of degenerate triplet bases encodes between 1 and 12 different amino 
acids. 

In the library of dual domain or multi-domain polypeptide molecules above, the 
repeated pattern of degenerate repeated triplet nucleotides encoding the linkers 
preferably has the following properties: 

(i) position 1 of each repeated triplet cannot be the same nucleotide as position 2 of 
the repeated triplet; or 

(ii) position 2 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet; or 

(iii) position 1 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet. 

Preferably, the nucleotide in the first and second positions of each repeated triplet is 
selected from any two of deoxyadenosine, deoxyguanosine, deoxycytidine or 
deoxythymidine. In one embodiment thereof (i) position 1 of each repeated triplet is 
deoxyadenosine or deoxyguanosine; (ii) position 2 of each repeated triplet is 
deoxycytidine or deoxyguanosine; and (iii) position 3 of each repeated triplet is 
deoxythymidine. 

The above library of dual- or multi-domain polypeptides is preferably produced 
in plant cells. 

Specific embodiments of this invention include any dual-domain (or multi- 
domain) polypeptide molecule selected from the library as described above. One 
embodiment provides a three domain peptide selected fi:om the above library which is a 
dual domain scFv polypeptide linked to a third polypeptide domain, third domain is 
preferably a toxin polypeptide with therapeutic utility or an enzyme with diagnostic 
utility or use as a research tool. The foregoing polypeptides are preferably produced in 
plant cells. 
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This invention is further directed to a method for generating the library of dual- 
domain nucleic acids as above, comprising: 

a. obtaining two template DNA sequences that comprises the first and the second 
domains; 

b. preparing ampUfication primer pairs which amplify the first and second domains 
where each primer pair comprises an upstream primer and a downstream primer, 
each primer having a 5 ' end and a 3 ' end, wherein the downstream primer for 
the first domain or the upstream primer for the second domain comprises a 
nontemplated sequence, 

the nontemplated sequence comprising a repeated pattern of degenerate 
repeated triplet nucleotides, wherein at least two of the 5' terminal 
triplets of the repeated pattern of degenerate repeated triplet nucleotides 
have the same degenerate sequence; 

c. amplifying the domains with the amphfication primers to generate at least one 
population of nucleic acid domains having different lengths and sequences in the 
non-templated sequence; and 

d. ligating the nucleic acid domains generated in step (c) to generate the a 
population of dual-domain molecules. 

In the above method, the repeated pattern of degenerate repeated triplet nucleotides in at 
least one of the primers preferably has the following properties: 

(i) position 1 of each repeated triplet cannot be the same nucleotide as position 2 of 
the repeated triplet; or 

(ii) position 2 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet; or 

(iii) position 1 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet. 

In one embodiment of the above libraries of dual- or multi-domain polypeptide 
molecules, a linker in the hbrary that consists of 10 or more residues in length should 
contain at least three different residues and a linker in the library that consists of 20 or 
more residues in length should contain at least four different residues. 
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In the above method, at least one of the primers preferably contains a non- 
templated endonuclease recognition site. 

In the foregoing methods, the template DNA sequences are preferably made by 
reverse transcription of mRNA. 

The method may further comprise the step of ligating the population of dual- 
domain nucleic acids to vectors, and, further comprise the step of introducing the vector 
into a host. In these methods, the nucleic acid domains generally will encode 
polypeptide domains, and the method preferably also comprises the step of expressing 
dual-domain polypeptides encoded by the dual-domain nucleic acids. In an additional 
step, the method may comprise the step of transcribing RNA from the vectors. 

For plant expression, the vectors should be compatible with replication and/or 
expression of the nucleic acids in plant cells. The method preferably includes the steps 
of introducing the transcribed the RNA into a plant cell and expressing the dual-domain 
(or multi-domain) polypeptide. 

This invention also provides a population of dual-domain polypeptides or a dual- 
domain polypeptide selected from that population, produced by the method described 
above. Preferably the population or selected polypeptide is produced in plant cells. 

Also provided is a method of producing a dual domain (or, with appropriate 
modifications, a multi-domain) polypeptide, comprising the steps of: 

(a) joining a nucleic acid encoding the first domain of the polypeptide to a nucleic 
acid encoding a first part of a hnker to produce a first nucleic acid construct; 

(b) joining the nucleic acid encoding a second part of the linker to a nucleic acid 
encoding the second domain of the polypeptide to produce a second nucleic acid 
construct; 

(c) incorporated the first and the second constructs into a transient plant expression 
vector in frame so that, when expressed, the polypeptide bears the first and 
second domain separated by the linker as described by the formula Di-L -D2. 

(d) transfecting a plant (or plant cell) with the vector so that the plant transiently 
produces the polypeptide; and 

(e) recovering the polypeptide as a soluble, fimctionally- folded protein. 
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General References 

Unless otherwise indicated, the practice of many aspects of the present invention 
employs conventional techniques of molecular biology, recombinant DNA technology 
and immunology, which are within the skill of the art. Such techniques are described in 
more detail in the scientific literature, for example, Sambrook, J. et al , Molecular 
Cloning: A Laboratory Manual, l""^ Ed., Cold Spring Harbor Press, Cold Spring 
Harbor, NY, 1989, Ausubel, P.M. etal Current Protocols in Molecular Biology, Wiley- 
Interscience, New York, current volume; Albers, B. et al. Molecular Biology of the 
Cell 2"^^ Ed., Garland Publishing, hic, New York, NY (1989); Lewin, BM, Genes IV, 
Oxford University Press, Oxford (1990); Watson, J.D. et al. Recombinant DNA, Second 
Edition, Scientific American Books, New York, 1992; Darnell, JOE et al. Molecular 
Cell Biology, Scientific American Books, Inc., New York, NY (1986); Old, R.W. et al. 
Principles of Gene Manipulation: An Introduction to Genetic Engineering, 2"^ Ed., 
University of California Press, Berkeley, CA (1981); DNA Cloning: A Practical 
Approach, vol. I & II (D. Glover, ed.); Oligonucleotide Synthesis (N. Gait, ed,, Current 
Edition); Nucleic Acid Hybridization (B. Hames & S. Higgins, eds., Current Edition); 
Transcription and Translation (B. Hames & S. Higgins, eds., Current Edition); 
Methods in Enzymology: Guide to Molecular Cloning Techniques (Berger and Kimball, 
eds., 1987j; Hartlow, E. et al. Antibodies: A Laboratory Manual, Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, NY, 1988) , Collegian, J.E. et al, eds., Current 
Protocols in Immunology, Wiley-Interscience, New York 1991. Protein structure and 
function is discussed in Schulz, GE et al. Principles of Protein Structure, Springer- 
Verlag, New York, 1978, and Creighton, TE, Proteins: Structure and Molecular 
Properties, W.H. Freeman & Co., San Francisco, 1983. 

DEFINITIONS 

As used herein, the following terms have the meanings ascribed to them unless 
specified otherwise. 

A polypeptide or protein "domain" generally refers to a region of a polypeptide 
chain that is folded in such a way that confers a particular structure and/or biochemical 
function. (Schulz et al, supra). Domains can be defined in structural or functional 
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tenns. A functional domain can be a single structural domain, but may also include 
more than one structural domain. Such functions can include enzymatic catalytic 
activity, ligand binding, chelating of an atom or endogenous fluorescence. As discussed 
above, and of particular importance to this invention, Vh and Vl regions of Ig molecules 
each form single structural domains, which act in concert in forming an antigen- 
combining site. A domain's function is dictated to a large extent by the distinct shapes 
into which it folds. Although most commonly used to describe proteins, a ''domain" can 
also describe a region of a nucleic acid, either the coding sequence of a polypeptide 
domain, or a nucleic acid structure that carries out a particular function (e.g., a 
ribozyme's catalytic activity or protein binding). Binding domains, defined by binding 
to a binding partner (receptor or ligand) are exemplified by the Vh and Vl regions of Ig 
molecules (see below), each of which forms a single structural domain that act in 
concert in forming an antigen-combining site. Other well-known binding domains are 
extracellular domains of cell surface receptors that bind a respective hgand, for 
example, a peptide hormone. Moreover, the portions of a polypeptide or peptide hgand 
such as erythropoietin, GM-CSF or enkephalin, that binds to its respective receptor is 
considered a functional (binding) domain. Parts of proteins that are responsible for the 
capacity to fluoresce (e.g. , green fluorescent protein - GFP) are also considered 
functional domains. 

A binding domain of a DNA or RNA molecule is a part of the molecule that 
binds a protein (preferably) such as a transcription factor (e.g., cAMP Response Element 
Binding Protein (CREB)), a restriction enzyme (e.g. , EcoR I) or a DNA polymerases 
(e.g., Taq DNA Polymerase). 

The present invention is directed in part to methods for creating dual-domain 
molecules. In preferred dual-domain molecules, the linker regions between the two 
domains is varied whereas the sequence of the linked domains is held constant. 

"Template DNA" refers to the DNA that is amphfied by "amplification primer 
pairs" (the population of ohgonucleotide primers used in the amplification reaction). 
This DNA may be produced by biological (recombinant) or synthetic (chemical) means. 
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Further, mRNA may be reverse transcribed to form the template DNA that is used in the 
amphfication reaction. 

An "upstream primer" is an oUgonucleotide primer, or a mixture of 
ohgonucleotide primers, that anneal(s) to the antisense strand of the template DNA. 

A "downstream primer" is an oligonucleotide primer, or a mixture of 
oligonucleotide primers, that anneal(s) to the sense strand of the template DNA. 

A "nontemplated sequence" is the portion of an amphfication primer that 
contains a repeated nucleotide triplet. As the goal of this sequence is to introduce 
variability into the linker library, it is not complementary to the DNA sequence being 
amphfied, e.g.,, the polypeptide domain-coding regions. 

The phrase "repeated pattern of degenerate triplet bases" refers to a nucleic 
acid sequence wherein a set of three bases (a triplet) is repeated in the nontemplated 
sequence, creating a repeating motif where the individual bases in the repeating triplet 
are independently selected from a defined array. For example, where the repeated triplet 
is nws (see Table 1), n can be any of a , c, g, or t; W can be a or t , and S can be 
g or c, rendering the repeated pattern degenerate. Herein, these repeated triplets are 
adjacent to each other. The nontemplated sequence of the amplification primer that 
contains these "repeated pattern of degenerate triplet bases" is produced in vitro, 

"Amplifying/amplification" refers to a reaction wherein the entire template 
DNA, or portions thereof, are duplicated at least once, preferably many times. 

"Ligating/ligation" refers to covalent coupling of two or more DNA strands (3' 
end to 5' end) using enzymatic and/or chemical methods. 

A "nontemplated endonuclease recognition site" is a sequence within the 
nontemplated sequence that is recognized by a restriction endonuclease. 

One use of the term "library" herein refers to a population, set or collection of 
nucleic acid molecules consisting of domains joined by linker sequences, which linkers 
vary in size and nucleotide sequence and which are produced using the methods 
described. The number of library members contained in the library which differ in 
nucleotide sequence is determined by the number of sequences contained in the repeated 
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pattern of degenerate triplet bases. The term "library" is also applied to the population 
of polypeptides encoded by the nucleic acid library. 

As used herein, a "linker" at the nucleic acid level is a nucleic acid molecule or 
sequence that joins two nucleic acid domains or two nucleic acid sequences encoding 
two polypeptide domains. The linker sequence has a pattern of degenerate repeated 
triplet nucleotides with the following properties: 

(i) position 1 of each repeated triplet cannot have the same nucleotide as at position 
2 of the repeated triplet; or 

(ii) position 2 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet; or 

(iii) position 1 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet. 

At the protein level, the linker is the peptide expression product of the linker nucleic 
acid sequence. In a preferred embodiment, the present linker excludes such sequences 
that encode (or are) Gly4Ser or repeats thereof 

As used herein, a "library of linkers" (or "linker library") at the nucleic acid 
level is a set or collection or population of nucleic acid molecules or sequences each of 
which joins two nucleic acid domains or two nucleic acid sequences encoding two 
polypeptide domains, each library member of which has a pattern of degenerate repeated 
triplet nucleotides with the following properties: 

(i) position 1 of each repeated triplet cannot have the same nucleotide as at position 
2 of the repeated triplet; or 

(ii) position 2 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet; or 

(iii) position 1 of each repeated triplet cannot be the same nucleotide as position 3 of 
the repeated triplet. 

At the protein level, the linker library is the set of expression products of the population 
of linker nucleic acid members of the library. 

A "single-chain antibody" (scFv; also termed "scAb" by others) is a single 
chain polypeptide molecule wherein an Ig heavy chain variable (Vh) domain and an Ig 
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light chain variable (Vl) domain are artificially linked by a relatively short peptide 
linker that allows the scFv to assume a conformation which retains binding capacity and 
specificity and for the antigen (or epitope) against which the original antibody (from 
which the Vh and Vl domains are derived) was specific. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows a Western blot analysis of scFv proteins generated in Example 1 
in plant protoplasts. CJ is the scFv with the (Gly4Ser)3 hnker. The number of the lane 
refers to the # of the clone. The size in kilodaltons (kD) is shown on the left. 

Figure 2 shows a Western blot analysis of scFv proteins generated in Example 2 
in whole plants. C J is the scFv with the (Gly4Ser)3 linker. The number of the lane 
refers to the # of the clone. The size in kDa is shown on the left. 

Figure 3 shows Coomassie stained SDS-PAGE analysis of scFv proteins 
generated in Example 3 in whole plants. The number of the lane refers to the # of the 
clone and the arrow indicates the scFv protein. The size in kDa is shown on the left. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The present invention employs expression systems, preferably plant-based, to 
produce dual-domain proteins, for example, individualized tumor-specific immunogens 
for treating B cell lymphoma. The plant-based transient heterologous expression system 
described herein produces correctly folded polypeptides in surprisingly high abundance 
and with surprisingly potent immunogenicity. This system allows rapid and economical 
production of usefiil quantities of such proteins or polypeptides. 

The nucleic acid encoding the dual-domain product is introduced into plants 
using an appropriate plant virus vector, described in detail below, leading to expression 
and rapid production of appropriately folded dual-domain protein in plant cells, plant 
parts and whole plants. 

The selection of (1) appropriate linkers and (2) the transient expression system, 
as described herein, ensure that usefiil dual-domain polypeptide molecules are secreted 
by the plant cells in a form that is folded in solution in a conformation that permits their 
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use for their intended purpose, e.g., as tumor-specific immunogens. An scFv produced 
according to this invention is advantageously obtained as the predominant secreted 
protein species in those plant cells into which it has been successfully incorporated. 
This permits simple selection and straightforward, rapid purification for the uses 
5 described herein, including as a vaccine composition. 

While plant expression systems are preferred for reasons enumerated herein, the 
invention is not intended to be limited to any particular system. The present approaches 
for generation of random linker libraries of varying degrees of complexity in the 
production of dual domain (or multi-domain) nucleic acids and proteins can be applied 
10 to other prokaryotic and eukaryotic hosts, for example bacteria, yeast cells or 
mammalian cells. 

In addition to the scFv vaccines comprising Ig V domains that are described 
below, the present invention can be applied directly to other protein antigens which can 
expressed in plants in a similar manner to achieve proper folding and enhanced 

15 immunogenicity. Examples include antigens that are common to a particular type of 

tumor or family of tumors, such as carcinoembryonic antigen (CEA), prostate-specific 
antigen (PSA) present in prostate adenocarcinomas, tyrosinase present in melanomas, 
and many other known and yet undiscovered tumor antigens. Another type of clonally- 
distributed (self) antigen is a T cell receptor (TCR) domain that includes a portion of the 

20 a, p, y or 5 chain V region (or a combination thereof). Such TCR-based antigens can 
be markers and therefore, targets in certain T cell leukemias and lymphomas as well as 
in autoinmiune diseases. Thus, autoimmune diseases associated with identifiable T cell 
clones or with usage of a particular TCR chain V region are modulated/treated by 
immunizing with a polypeptide antigen corresponding to TCR V region polypeptides 

25 that is made by the approach described herein. 

Other dual domain proteins within the scope of the invention include a viral coat 
protein domain combined with another domain of interest. If necessary, this molecule is 
purified taking advantage of the coat protein's characteristics. 

The protein domains are not hmited to those expressed on the cell surface; dual 

30 domain proteins wherein one or both polypeptides are derived fi^om a cytosolic protein 
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or a protein that functions in soluble form are also intended. Examples include 
cytokines such as IL-lp and polypeptide hormones. 

Other preferred polypeptide domains that are linked as dual- or multi-domain 
proteins using the linker approach of the present invention are transcription factors. 
These can be assembled so that active domains of different transcription factors that act 
in concert or sequentially are combined as single chain molecules separated by linkers. 
The linker size and complexity is chosen on the basis of the functional requirements for 
the transcription factors, e.g. , the distance between the nucleic acid binding sites for 
these factors if they must bind and act at about the same time. Such dual domain or 
multi-domain polypeptides would be expected to show advantageous properties in 
promoting, activating or orchestrating transcriptional events. This may be particularly 
useful in cases where more than one factor must act and one is limiting in its 
concentration or availabiUty. This limitation is overcome by creating an artificial dual 
domain or multi-domain transcription factor where the domain of the otherwise limiting 
factor is always Unked to a domain or domains or one or more nonlimiting transcription 
factors. 

Alternatively, a transcription factor domain may be linked using the present 
approach to an inhibitory moiety such as a toxin so that binding of the transcription 
factor domain to its target DNA permits the toxin to perform its function and inhibit 
transcription or otherwise block a cellular function. Use of the stimulatory or inhibitory 
transcription factor constructs with Unkers having the appropriate flexibility could 
permit the attainment of new levels of control over cellular functions not heretofore 
possible using mixtures of proteins or by protein domains that have been hnked by a 
limited array of preselected individual linkers. The random linker library approach 
generates a much larger array of choices that can be selected by appropriate means as 
described herein. 

The dual- or multi-domain polypeptides prepared in accordance with this 
invention using the random linker library approach can be delivered to a target cell 
exogenously, or can be combined in an expression system that is inserted into the target 
cell and functions autonomously or under the control of cellular factors. This can be 
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accomplished using routine method of molecular biology using conventional vectors 
such as viral vectors that deliver the nucleic acid encoding the polypeptides to the 
appropriate cells by selective or nonselective means. 

The product of the present invention may be used in the form of a dual (or multi) 
domain nucleic acid molecule, for example, a bifunctional DNA vaccine that is intended 
for administration to a subject and, when expressed, produces an immunogenic dual 
domain protein in the subject. 

Unless otherwise indicated, the practice of the present invention employs 
conventional techniques of molecular biology, recombinant DNA technology and 
immunology, which are within the skill of the art. Such techniques are described in 
more detail in the references listed earlier. 

Focusing on a linker region L between two polypeptide domains, it may be 
difficult to predict what amino acid substitutions or additions will optimize a particular 
property of the linker, and therefore, of the multi-domain polypeptide as tested, for 
example, in a biochemical or biological assay. The length and the sequence of L can 
affect the activity of the polypeptide product because of an impact on properties such as 
solubility, folding and conformation, protease susceptibility or expression level. 

The present invention provides approaches for creating a nucleic acid library, 
that when expressed, results in a library of polypeptides with linker regions that areO 
variable in both length and sequence. This invention permits a practitioner to create and 
analyze such libraries, thereby providing advantages over the prior art where either 
length or sequence, but not both, could be varied. 

The present invention is based on the use of known template nucleic acids that 
encode the protein domains of interest. The nucleic acid encoding a first domain is 
amplified in a PGR reaction using an upstream primer that is complementary to the 
antisense strand of the template and a downstream primer that is complementary to the 
sense strand of the template DNA and that may contain repeated triplets of nucleotides 
at its 5' end. 

Then the nucleic acid for the second domain is amplified in a PGR reaction with 
an upstream primer that is complementary to the antisense strand of the template DNA 
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and that may have a repeated nucleotide triplet sequence at its 5' end and with a 
downstream primer that is complementary to the sense strand of the template DNA. 

To get the desired variabihty in length and sequence, either the downstream 
primer for the first domain and/or the upstream primer for the second domain must 
contain the repeated triplet of nucleotides. The resulting two PGR products are then 
combined to form a nucleic acid that encodes a dual-domain protein, or contains the 
dual DNA or dual RNA domains that are linked by the linker region. This resultant 
molecules (protein, DNA or RNA) can then be analyzed by a variety of means known to 
those of skill in the art. 

The structures of proteins and nucleic acids and their domains are determined by 
well-known biochemical and biophysical methods, in particular X-ray crystallography 
and two-dimensional nuclear magnetic resonance (2D-NMR) spectroscopy. Inspection 
of a 3D structure may be sufficient to delineated a macromolecule's domains. For 
example, the 3D structure of the dimeric enzyme glutathione reductase illustrates that 
each subunit is composed of three structural domains - a FAD binding domain, a NADP 
binding domain and a third domain that forms the interface between the dimers. See 
Schulz et al. , supra. The Ig Vr and Vl domains cooperate to form the antibody's 
antigen binding pocket. Thus these structural domains fold into distinct shapes that are 
important for the molecule's function. 

CLONING OF DOMAINS 

A domain may be isolated by any of a number of techniques. In general, a 
nucleic acid sequence encoding a polypeptide (or RNA) domain of interest is cloned 
from an appropriate cDNA library or a genomic DNA library based on hybridization 
with a oligonucleotide probe that represents the domain. 

For the present invention, preferred nucleic acids and proteins are mammalian, 
more preferably human sequences. 

Alternatively, the DNA is isolated by amplification techniques using 
oligonucleotide primers starting with a DNA or RNA template. (See, e.g., Dieffenfach 
et al, PCR Primer: A Laboratory Manual (1995)). These primers can be used to 
amplify either a full length coding sequence or a partial sequence that could constitute a 



DC043399 



19 



Dkt: LSB-006 



probe (ranging in length up to about several thousand nucleotides). The resultant probe 
sequence is then used to screen a mammalian library for the full-length nucleic acid of 
interest. Use of synthetic oligonucleotide primers and amphfication of an RNA or DNA 
template is described in U.S. Patents 4,683,195 and 4,683,202; PCR Protocols: A Guide 
to Methods and Applications (Innis et aL, eds, 1990)). Methods such as PCR and ligase 
chain reaction (LCR) can be used to amplify nucleic acid sequences of domains directly 
from mRNA, from cDNA, or from genomic or cDNA libraries. Degenerate 
oligonucleotides can be designed to amplify domain homologues using the known 
sequences that encode the domain. Restriction endonuclease sites can be incorporated 
into the primers. Genes amplified by the PCR reaction can be purified on agarose gels 
and cloned into an appropriate vector. 

In expression cloning, nucleic acids are isolated from expression libraries using 
as a probe an antibody (or other binding partner) specific for an epitope of the expressed 
polypeptide. Polyclonal or monoclonal antibodies (mAbs) can be raised by 
immunization with one or more peptide fragments of the domain being cloned. 

Nucleic acid probes, preferably oligonucleotides are used under preferably 
stringent hybridization conditions to screen libraries in order to isolate polymorphic 
variants or alleles of the genes that encode the polypeptide domain of interest. 
Alternatively, antibody-based expression cloning permits cloning of polymorphic or 
allelic variants or interspecies homologues. 

Selection of sources for the cDNA library and its production from mRNA is 
done using conventional methods (Gubler et ah, Gene 25:263-269 (1983); Sambrook et 
aL, Molecular Cloning, A Laboratory Manual (2 ed. 1989); Current Protocols in 
Molecular Biology (Ausubel et aL, eds., 1994 or latest edition). 

Methods for preparing genomic DNA libraries are conventional in the art. For 
example, DNA extracted from a tissue may be mechanically sheared or enzymatically 
digested to yield fragments of about 12-20 kb that are separated by gradient 
centrifugation and inserted into appropriate expression vectors. These vectors are 
packaged into phage in vitro. Recombinant phage are analyzed by plaque hybridization 
(Benton et al.. Science 196:180-182 (1977). Colony hybridization is carried out, for 
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example, as generally described by Grunstein et al, Proc. Natl Acad, ScL USA., 
72:3961-3965 (1975). 

Synthetic oligonucleotides can be used to constract recombinant "genes" for use 
as probes or for expression of the domain polypeptides. 

Oligonucleotides can be chemically synthesized using solid phase 
phosphoramidite triester methods (Beaucage et al. Tetrahedron Letts. 22:1859-1862 
(1981)) using an automated synthesizer (Van Devanter et al, Nucleic Acids Res, 
12:6159-6168 (1984)). Purification of oligonucleotides is typically by native 
acrylamide gel electrophoresis or by anion-exchange HPLC (Pearson et al, J. Chrom. 
255:137-149(1983)). 

Sequences of cloned genes and synthetic oligonucleotides can be verified by 
conventional methods such as the chain termination method (Wallace et al. Gene 
16:21-26 (1981) using a series of overlapping oligonucleotides usually 40-120 bp in 
length, representing both the sense and antisense strands of the gene. 

The nucleic acid encoding the desired polypeptide is typically cloned into an 
intermediate vector before transformation or transfection of prokaryotic or eukaryotic 
cells for replication and/or expression of the nucleic acid. These intermediate vectors, 
e.g., plasmids or shuttle vectors, are typically for use in prokaryotic cells. 

LINKER REGION 

Functions of the linker L are to join a first and a second polypeptide (or nucleic 
acid) domain as a single macromolecule, permit the two domains to fold correctly and 
thereby assemble into a functional molecule. In the scFv embodiment where the amino 
acid linker L links the Vh and Vl domains, L may vary in length between 1 and about 
50 residues. An individual L preferably is composed of between 1 and about 20 
different amino acids, and each repeated pattem of degenerate triplet bases encodes 
between 1 and about 12 different amino acids. An optimal linker contributes 
significantly to the correct folding of the Vh and Vl domains so that the resulting scFv 
(a) is soluble and (b) binds antigen or (c) is able to act as an antigen to elicit a relevant 
immune response. 
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In one embodiment the linker will be resistant to cleavage by proteases that the 
final product is expected to encounter when being used. 

In contrast, the linker may also be designed to incorporate an amino acid or short 
sequence that serves as a cleavable site for a protease that can be used to separate the 
one or several domains from one another at an appropriate time. 

Additionally, the linker may be designed to confer affinity to another molecule 
or matrix facihtating subsequence purification of the expressed of the fiised domains 
based on the properties of the Unker. One example includes incorporation of a histidine 
(His) tag that permits purification on a metal (e.g., nickel) affinity column. Other 
affinity tags are well-known in the art and need not be described here. 

Depending on the two domains being linked, the sequence and length of L can 
vary widely. 

Linkers may be selected based on their ability to fuse two polypeptide domains 
and at the same time, facilitate purification and characterization based on the properties 
of one (or both) domains. Examples include fixsions of a selected protein domain and 
glutathione S-transferase (GST), which can then be purified on an affinity matrix of 
glutathione-agarose (Smith et al (1988) Gene, 67:31-40). The linker used by Smith et 
al was later modified by Guan et al {Anal Biochem. 792:262-267(1991)) to introduce a 
glycine rich stretch known as a "glycine kinker" having the amino acid sequence 
PGISGGGGG [SEQ ID N0:1] . Such a linker, within the scope of this invention, 
facilitates the cleavage of GST from its fiasion partner (in that example, a protein 
tyrosine phosphatase). 

Vectors for producing these kinds of fiision proteins are well-known in the art, 
and many are commercially available. For example, New England Biolabs provides 
pMAL-p2, a vector that encodes a maltose binding protein that can be fiised to a domain 
sequence that is cloned into the vector. In pMAL-p2, the amino acid sequence of the 
linker between the maltose-binding protein and the added domain is 
NNNNNNNNNNLGIEGR [SEQIDN0:2]. The stretch of asparagines facilitates 
purification of the fiision protein on an amylose affinity column. 
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A linker that has been used to hnk Ig Vh and Vl domains into an scFv is the 15 
amino acid sequence GGGGSGGGGSGGGGS (SEQ ID N0:3), commonly designated 
(Gly4-Ser)3. A number of other linkers for scFv production have been described in 
Lawrence et al, FEES Letters, 425: 479-484 (1998), Solar et al. Protein Engineering, 
8:717-723 (1995), Alfthan et al. Protein Engineering, 8: 725-731 (1995), Newton et al. 
Biochemistry, 35:545-553 (1996), Ager et al. Human Gene Therapy, 7:2157-2164 
(1996) and Koo et al. Applied and Environmental Microbiology, 64:2490-2496 (1998). 
The library approach of this invention will generate many useful linkers beyond those 
noted above. 

Creation of Variable Length and Sequence in the Linker Region 

A preferred approach is to create a library of two domain polypeptides (D1-L-D2) 
wherein each library member varies from all others in L. In other words, randomness 
between the domains is found in the linkers that hnk them. This permits the generation 
of an array of D1-L-D2 products, particularly in a plant expression system, from which 
one can select one, or an array, of optimally folded, optimally fiinctioning products. 

In this approach, two cloned domains are amplified and a linker of variable 
length and variable sequence is introduced between them using an amplification method 
such as PGR. To achieve this, a portion of the 3 ' end of the downstream primer for the 
upstream domain and the 3' end of the upstream primer for the downstream domain are 
complementary to the respective domain sequence being amplified. ("Downstream" and 
"upstream" are relative to the linker). However, a portion of the 5' end of the 
downstream primer for the upstream domain and/or the 5' end of the upstream primer 
for the downstream domain are not complementary to the respective domain being 
amplified. This noncomplementary segment of the primers, termed a "nontemplated 
sequence," contains a repeated pattem of degenerate triplet bases which, at the nucleic 
acid level, join the upstream to the downstream domain. 

The upstream and downstream primers for amplifying Di and D2 are mixed with 
a DNA polymerase and other necessary reactants for amplification. See Innis et al, 
supra, for details. The reaction mixture is subjected to multiple temperature cycles to 
melt DNA duplexes, allow annealing of primers to template and polymerization of the 
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PGR product. During the first cycle the DNA polymerase carries out "first strand" 
synthesis until the temperature is raised sufficiently to melt the duplexes. Thereafter, 
when the temperature is lowered to the annealing temperature, the primers will anneal to 
the first strand DNA. The DNA polymerase will then make a "second strand" as the 
5 polymerization temperature of the cycle is reached. This results in exponential 

accumulation of the domain being ampUfied. Because of the nontemplated sequences, 
the amplified domain-encoding DNA will form a population (library) of molecules with 
a repeated pattern of degenerate bases at the 3' end of the upstream product and the 5' 
end of the downstream product. 

1 0 Due to the nature of the repeated pattem of degenerate triplet bases in the 

nontemplated sequences of the amplification pairs, the PGR products are diverse in 
sequence and length in the L region. The length diversity is mostly likely due to duplex 
formation of the L region of the primers with bubbles or loops in the middle due to base 
pair mismatching. The 3 '-5' exonuclease and the 5 '-3' polymerase activities serve to 

15 delete or extend the length of the primer sequence. 

To shorten the L sequence, a primer containing the repeated triplet is annealed to 
a complementary strand that has already incorporated the L sequence. The degenerate 
primer can then anneal to form a duplex with a bubble at the site of unpaired bases, and 
leave an unpaired 3' extension (overhang), as diagrammed below (underscored). 

20 Duplex with bubble and 3 ' overhang 

RST- RST- RST- RST- RST 

/ \ 
5 ' RST CAT-GCC 3 ' 



25 3 ' YSA- YSA-YSA-YSA-YSA-YSA- GTA-CGG 5 ' 

(upper sequence is SEQ ID NO:50; lower sequence is SEQ ID N0:51) 

An enzyme such as PFU or Vent that has 3 '-5' exonuclease activity will degrade 
the 3' extension in the 5' direction of the complementary strand until it reaches the 
30 annealed portion of the duplex. In this manner one or more triplet repeats can be 
removed fi*om the PGR product, thereby shortening the peptide linker L by one (or 
more) amino acids. 



DC043399 



24 



Dkt: LSB-006 



I 



For extension of the linker L , the "top" strand can anneal to the complementary 
strand so that a duplex with a 5' extension is formed, as follows: 

Duplex with bubble and 5' overhang. 
5' RST-RST-RST-RST-RST-RST- CAT-GCC 3' 



3' YSA GTA-CGG 5' 

\ / 
YSA-YSA-YSA-YSA-YSA 

(upper sequence is SEQ ID NO:50; lower sequence is SEQ ID N0:51) 

10 The polymerase present in the amplification reaction, e.g.^ Tag polymerase, can 

extend the PGR product by one or more triplet repeat codons. Because of its 5 '-3 ' 
polymerase activity, the enzyme can fill in the 5' extension, thereby lengthening the 
linker region by one or more repeated triplets. This will extend of the peptide linker by 
one or more amino acids. If the polymerase in the PGR lacks 3'-5' exonuclease activity, 

15 and if no enzyme with 3 '-5' exonuclease activity is present, then only extensions of 

triplet nucleotides should occur. 

To promote bubble formation, the 5' end of at least one primer must contain the 
same degenerate bases in at least two terminal codons to prevent slippage. That is, there 
must be two triplet repeats with the same sequence (e.g., 5 ' rst- rst 3 ' , or 

20 5 ' ysa-ysa3 ' , etc.) at the 5' end of at least one of the primers used to amplify a 

domain. 

To retain the proper reading frame, which is important if the fused nucleic acid is 
to express a protein (as is the case with an scFv), several rules should be observed in 
designing the degeneracy of the nontemplated region of the primers that will be the L 
25 region. The degenerate triplet repeats should obey one of the following rules: 

(a) position 1 of the triplet cannot contain the same base as position 2; or 

(b) position 2 of the triplet cannot contain the same base as position 3; or 

(c) position 1 of the triplet cannot contain the same base as position 3. 
For example, a repeated triplet rst and ysa will obey these rules. The following 

30 combinations of bases fulfill those rules: rst = agt, act, ggt, get and ysa = tea, 
tga, cca, cga. Other degenerate sequences can also fulfill these rules. For example 



DC043399 



25 



Dkt: LSB-006 



.1 



t 



St r (which can be gta, gtg, eta, or ctg) or ay r (which can be aca, acg, ata or 
atg) could serve as a repeated triplet. 

Another degenerate triplet sequence useful in this invention is n vt which can be 
any of 12 different codons encoding 1 1 different amino acids. The degenerate triplet 
5 nws can be any of 16 different codons encoding 12 different amino acids. The 

degenerate triplet csy does not adhere to these rules because it could be CCC (which 
does not comply). Similarly, any other degenerate sequence that can be a triplet of 
identical bases (i.e., CCC, aaa, ggg, or ttt) would not obey these rules and would 
thus be excluded as a repeated triplet, 

10 Restriction enzyme recognition sequences can be incorporated into the primers 

to facilitate cloning and orientation of, for example the IgV region domains (or any 
other polypeptide domains) with respect to each other. For example, a restriction 
endonuclease site may be incorporated in the 5' end of the upstream amplification 
primer for the D] domain, which will facilitate hgation of the 5' end of the upstream 

15 domain to the 5' end of a restricted vector into which that fragment is being subcloned. 

Likewise the same or a different restriction site can be incorporated in the 5' end of the 
downstream amplification primer for the downstream domain. The resulting PCR 
product can then be restricted with the respective endonuclease(s) for subsequent 
ligation into a vector that has complementary sequence(s) to the PCR products. 

20 Altematively the same restriction site can be used, and the subclones can be screened by 
DNA sequencing, PCR, restriction enzyme digestion, etc., to determine if the correct 
orientation has been achieved. 

Li gation of the PCR products 

The 3' end of the upstream PCR product and the 5' end of the downstream PCR 
25 product can be ligated to one another (Methods in Enzymology: Guide to Molecular 

Cloning Techniques, Berger et al, eds, 1987)). If both ends of these products are blunt, 
the 5' phosphates can be phosphorylated by T4 poljmiucleotide kinase and the reaction 
products ligated with T4 DNA ligase. If the ends of the PCR products are 
complementary or can be made complementary through restriction endonuclease 
30 digestion, then a sticky end ligation can be performed wherein the complementary ends 
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are ligated with T4 DNA ligase. Likewise the 5' end of the upstream PGR product 
and/or the 3 ' end of the downstream PGR product can be Hgated to a restricted vector in 
a blunt end or a sticky end Ugation. 

To increase the sequence and length complexity of the linker region of the 
population of dual-domain molecules, such as an scFv, multiple PGR reaction products 
of Di and D2 can be combined. For example, a PGR reaction of D| and/or D2 where the 
degenerate triplet is repeated six times can be combined with PGR reactions of the Di 
and/or D2 where the degenerate triplet is repeated nine times and ligated into the 
appropriate vector. The combination of the PGR products will increase the length and 
sequence complexity observed in the L region. 

The complexity of the linker sequences obtained in the population or "library" 
can be pre-determined by the number of different amino acids designed into the 
nontemplate sequence of the PGR amplification primers used to amphfy the domains. 
The number of amino acids encoded by the nontemplated sequence is determined by the 
nucleotide degeneracy designed into each codon triplet. 

In one example, the desired complexity of the linker sequence present in a 
library is limited to two amino acids, Ala and Gly. The nontemplated sequence 
preferred for this linker combination would be repeats of the codon triplet gst 
(= get and ggt),wheregct encodes Ala and ggt encodes Gly. 

In a second example, the desired complexity of the linker sequence present in a 
library is increased to six amino acids, Ala, Gly, Ser, Thr, Lys and Asp. The 
nontemplated sequence preferred for this linker combination would be repeats of the 
codon triplet rvt (=gct , gg t , agt , act , aat and gat) , wherein the 

following amino acids are encoded: 

get Ala ggt-Gly aat-Lys 

agt-Ser act-Thr gat-Asp 

The same approaches are used to generate multi-domain polypeptides of higher 
order, e,g., three- or four-domain polypeptides. These can comprise all different 
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domains or one or more domains can be repeated. General structures for such molecules 
is as follows (where D is a polypeptide domain and L is a linker): 



Di-Li-Di 


D1-L1-D2 


D,-Li-D2-L2-D2 


D1-L1-D2-L2-D3 


Di-Li-D2"L2-D3-L3-D4 


etc. 



The different linkers between the various domains can vary in complexity. This will 
5 depend on the structural relationship required for the proper function of each domain for 

its intended purpose. Thus, in the example of an scFv molecule with a single idiotype or 
with a single ligand-binding specificity, the two domains must function in concert for 

0 proper binding. In a 3 -domain polypeptide which is an scFv of desired binding 

n specificity wherein the third domain D3 is a toxin, there are fewer constraints on the 

^ 10 "interaction'' between the toxin domain and either of the two binding domains. In that 
y case, the linker L2 between one of the scFv domains and the toxin domain can be 

different, less complex than the linker Li between the two domains (Di and D2) that 
f comprise the scFv polypeptide. 

In a library of multi-domain polypeptides, not every pair of domains is 

1 15 necessarily be joined by a linker according to the present invention. Thus, two or more 

adjacent domains may be (1) Hnked directly as may occur in their native state (if they 
are derived from naturally dual- or multi-domain proteins), or (2) linked by a 
"conventional" linker well-known in the art. In yet another embodiment, a particular 
linker identified using the present invention and derived as a member of a random linker 
20 library may be a preferred choice for use as a non-random hnker between two given 

domains in a multi-domain polypeptide. These various embodiments can be depicted in 
the following (non-limiting) manner: 

D1-L1-D2-D3 
D1-L1-D2-D3-D4 
25 D1-L1-D2-D3-D4-D5 

D1-L1-D2-D3-D4- L2-D5 
etc. 
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In the four formulas shown above, Li and L2 indicate random linker members of the 
libraries of the present invention. All other domains shown bonded to adjacent domains 
without a hnking L may be (1) directly bonded to one another as described above; (2) 
linked by a conventional linker known in the art; or (3) linked by a fixed linker 
discovered in a random linker library according to this invention but inserted as a 
predetermined, non-random, non- varying linker in the particular location. As noted in 
the Summary section, above a multi-domain polypeptide herein may be composed of up 
to about 20 domains. For example, a 10-domain polypeptide may have anywhere 
between 1 and 9 linkers L according to this invention. If a 10 domain polypeptide has 
on one such linker Li linking two domains, the other 8 domains are either directly 
bonded to one another or linked by conventional or other predetermined linker groups. 

Expression System for Production of the Dual-domain Polypeptide 

A number of well-known heterologous expression systems in bacterial, insect, 
mammalian and plant were discussed above, each with its advantages and 
disadvantages. The present invention is particularly suited for plant expression. 

A number of transformation methods permit expression of heterologous proteins 
in plants. Some involve the construction of a transgenic plant by integrating DNA 
sequences encoding the protein of interest into the plant genome. The time it takes to 
obtain transgenic plants may be too long for the rapid production certain embodiments 
such as a tumor vaccine polypeptide. An attractive solution (an altemative to such 
stable transformation) is transient transfection of plants with expression vectors. Both 
viral and non-viral vectors capable of such transient expression are available (Kumagai, 
M.H. et al (1993) Proc, Nat Acad. Set USA 90:427-430; Shivprasad, S. et al (1999) 
Virology 255:312-323; Turpen, T.H. et al (1995) BioTechnology 13:53-57; Pietrzak, M. 
et al (1986) Nucleic Acid Re, 14:5857-5868; Hooykaas, P.J J. and Schilperoort, R.A. 
(1992) Plant Mol Biol 19:15-38), although viral vectors are easier to introduce into 
host cells, spread by infection to amplify the expression and are therefore preferred. 

Chimeric genes, vectors and recombinant viral nucleic acids of this invention are 
constructed using conventional techniques of molecular biology. A viral vector that 
expresses heterologous proteins in plants preferably includes (1) a native viral 
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subgenomic promoter (Dawson, W.O. et al (l9SS)Phytopathology 78:783-789 and 
French, R. et al. (1986) Science 231:1294-1297), (2) preferably, one or more non-native 
viral subgenomic promoters (Donson, J, et al. (1991) Proc. Nat. Acad. Sci. USA 
88:7204-7208 and Kumagai, M.H. et al. (1993) Proc. Nat. Acad. Sci. USA 90:427-430), 
(3) a sequence encoding viral coat protein (native or not), and (4) nucleic acid encoding 
the desired heterologous protein. Vectors that include only non-native subgenomic 
promoters may also be used. The minimal requirement for the present vector is the 
combination of a replicase gene and the coding sequence that is to be expressed, driven 
by a native or non-native subgenomic promoter. The viral replicase is expressed from 
the viral genome and is required to replicate extrachromosomally. The subgenomic 
promoters allow the expression of the foreign or heterologous coding sequence and any 
other useful genes such as those encoding viral proteins that facilitate viral replication, 
proteins required for movement, capsid proteins, etc. The viral vectors are encapsidated 
by the encoded viral coat proteins, yielding a recombinant plant virus. This 
recombinant virus is used to infect appropriate host plants. The recombinant viral 
nucleic acid can thus rephcate, spread systemically in the host plant and direct RNA and 
protein synthesis to yield the desired heterologous protein in the plant. In addition, the 
recombinant vector maintains the non- viral heterologous coding sequence and control 
elements for periods sufficient for desired expression of this coding sequence. 

The recombinant viral nucleic acid is prepared from the nucleic acid of any 
suitable plant virus, though members of the tobamo virus family are preferred. The 
native viral nucleotide sequences may be modified by known techniques providing that 
the necessary biological functions of the viral nucleic acid (rephcation, transcription, 
etc.) are preserved. As noted, one or more subgenomic promoters may be inserted. 
These are capable of regulating expression of the adjacent heterologous coding 
sequences in infected or transfected plant host. Native viral coat protein may be 
encoded by this RNA, or this coat protein sequence may be deleted and replaced by a 
sequence encoding a coat protein of a different plant virus ("non-native" or "foreign 
viral"). A foreign viral coat protein gene may be placed under the control of either a 
native or a non-native subgenomic promoter. The foreign viral coat protein should be 
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capable of encapsidating the recombinant viral nucleic acid to produce functional, 
Oinfectious virions. In a preferred embodiment, the coat protein is foreign viral coat 
protein encoded by a nucleic acid sequence that is placed adjacent to either a native viral 
promoter or a non-native subgenomic promoter. Preferably, the nucleic acid encoding 
5 the heterologous protein, e.g., an immunogenic polypeptide to be expressed in the plant, 

is placed under the control of a native subgenomic promoter. 

An important element of this invention, that is responsible in part for the proper 
folding and copious production of the heterologous protein (exemphfied as the 
immunogenic scFv polypeptide), is the presence of a signal peptide sequence that directs 

10 the newly synthesized protein to the plant secretory pathway. The sequence encoding 
the signal peptide is fused in frame with the DNA encoding the polypeptide to be 
expressed. A preferred signal peptide is the a-amylase signal peptide. 

In another embodiment, a sequence encoding a movement protein is also 
incorporated into the viral vector because movement proteins promote rapid cell-to-cell 

15 movement of the virus in the plant, facilitating systemic infection of the entire plant. 

Either RNA or DNA plant viruses are suitable for use as expression vectors. The 
DNA or RNA may be single- or double-stranded. Single-stranded RNA viruses 
preferably may have a plus strand, though a minus strand RNA virus is also intended. 
The recombinant viral nucleic acid is prepared by cloning in an appropriate 

20 production cell. Conventional cloning techniques (for both DNA and RNA) are well 
known. For example, with a DNA virus, an origin of replication compatible with the 
production cell may be spliced to the viral DNA. 

With an RNA virus, a full-length DNA copy of the viral genome is first prepared by 
conventional procedures: for example, the viral RNA is reverse transcribed to form 

25 +subgenomic pieces of DNA which are rendered double-stranded using DNA polymerases. 

The DNA is cloned into an appropriate vector and inserted into a production cell. The 
DNA pieces are mapped and combined in proper sequence to produce a full-length DNA 
copy of the viral genome. Subgenomic promoter sequences (DNA) with or without a coat 
protein gene, are inserted into nonessential sites of the viral nucleic acid as described 

30 herein. Non-essential sites are those that do not affect the biological properties of the viral 
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nucleic acid or the assembled plant virion. cDNA complementary to the viral RNA is 
placed under control of a suitable promoter so that (recombinant) viral RNA is produced in 
the production cell If the RNA must be capped for infectivity, this is done by conventional 
techniques. 

Examples of suitable promoters include the lac, lacuv5, trp, tac, Ipl and ompF 
promoters. A preferred promoter is the phage SP6 promoter or T? RNA polymerase 
promoter. 

Production cells can be prokaryotic or eukaryotic and include Escherichia coli, 
yeast, plant and manmialian cells. 

Numerous plant viral vectors are available and well knov^n in the art (Grierson, 
D. et al. (1984) Plant Molecular Biology, Blackie, London, pp. 126- 146; Gluzman, Y. et 
al (1988 ) Communications in Molecular Biology: Viral Vectors, Cold Spring Harbor 
Laboratory, New York, pp. 172-189). The viral vector and its control elements must 
obviously be compatible with the plant host to be infected. Suitable viruses are 

(a) those from the tobacco mosaic virus (TMV) group, such as TMV, tobacco mild 
green mosaic virus (TMGMV), cowpea mosaic vims (CMV), alfalfa mosaic virus 
(AMV), Cucumber green mottle mosaic virus - watermelon strain (CGMMV-W), 
oat mosaic virus (OMV), 

(b) viruses from the brome mosaic virus (BMV) group, such as BMV, broad bean 
mottle virus and cowpea chlorotic mottle virus, 

(c) other viruses such as rice necrosis virus (RNV), geminiviruses such as Tomato 
Golden Mosaic virus (TGMV), Cassava Latent virus (CLV) and Maize Streak virus 
(MSV). 

A preferred host is Nicotiana benthamiana. The host plant, as the term is used 
here, may be a whole plant, a plant cell, a leaf, a root shoot, a flower or any other plant 
part. The plant or plant cell is grown using conventional methods. 

A preferred viral vector for use with N, benthamiana is expression vector 
pBSG1250 (pTTOSA derivative) containing a hybrid fiision of TMV and tomato mosaic 
virus (ToMV) (Kumagai, MH. et al. (1995) Proc, Natl. Acad. Sci. USA 92:1679-1683). 
The inserted subgenomic promoters must be compatible with TMV nucleic acid and 
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capable of directing transcription of properly situated (e.g., adjacent) nucleic acids 
sequences in the infected plant. The coat protein should permit the virus to systemically 
infect the plant host. TMV coat protein promotes systemic infection of A^. 
benthamiana. 

Infection of the plant with the recombinant viral vector is accomplished using a 
number of conventional techniques known to promote infection. These include, but are 
not limited to, leaf abrasion, abrasion in solution and high velocity water spray. The 
viral vector can be delivered by hand, mechanically or by high pressure spray of single 
leaves. 

Purification of the Protein/Polvpeptide Product 

The dual-domain polypeptide produced in plants is preferably recovered and 
purified using standard techniques. Suitable methods include homogenizing or grinding 
the plant or the producing plant parts in liquid nitrogen followed by extraction of 
protein. If for some reason it is not desirable to homogenize the plant material, the 
polypeptide can be removed by vacuum infiltration and centrifugation followed by 
sterile filtration. Protein yield may be estimated by any acceptable technique. 
Polypeptides are purified according to size, isoelectric point or other physical property. 
Following isolation of the total secreted proteins fi:om the plant material, further 
purification steps may be performed. Immunological methods such as 
immunoprecipitation or, preferably, affinity chromatography, with antibodies specific 
for epitopes of the desired polypeptide may be used. 

To facilitate purification, the viral vector can be engineered so that the protein is 
produced with an affinity tag that can be exploited at the purification stage. An 
examples of such a tag is the histidine (His) tag that permits purification on a metal 
(e.g., nickel) affinity column. Other affinity tags are well-known in the art and need not 
be described here. 

Various solid supports may be used in the present methods: agarose®, 
Sephadex®, derivatives of cellulose or other polymers. For example, staphylococcal 
protein A (or protein L) immobilized to Sepharose® can be used to isolate the target 
protein by first incubating the protein with specific antibodies in solution and contacting 
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the mixture with the immobilized protein A which binds and retains the antibody-target 
protein complex. 

Using any of the foregoing or other well-known methods, the polypeptide is 
purified from the plant material to a purity of greater than about 50%, more preferably 
greater than about 75%, even more preferably greater than about 95%. 

Determination of Correct Folding 

Critical for certain properties such as immunogenicity is the protein's 
conformation in solution. The conformation of the relevant epitopes of the dual-domain 
polypeptide in solution preferably resemble or mimic the same epitopes of the native 
protein. By producing polypeptides in plants, and targeting them to the plant's secretory 
pathway, the present invention insures that the polypeptide is secreted in soluble, 
optimally folded, form. 

A preferred reagent to be used in determining proper folding is a specific 
antibody, preferably a mAb, which (1) binds to an epitope of the polypeptide when the 
chains are correctly folded but (2) does not bind when the epitopes are denatured. The 
antibody is employed in any of a number of immunological assays, including dot blot, 
western blot, immunoprecipitation, radioimmunoassay (RIA), and enzyme 
immunoassays (EIA) such as an enzyme-linked immunosorbent assays (ELISA), In 
preferred embodiments, when such antibodies are available, Western blots and ELISAs 
are employed to verify correct folding of the relevant parts of the dual domain (or multi- 
domain) polypeptide produced in the plant. 

Additional Analysis of the Dual-Domain Molecule 

DNA encoding the dual domain polypeptide can be sequenced, yielding a 
deduced amino acid sequence of its encoded product. If the DNA molecule has been 
subcloned, it can be excised from the vector with a restriction enzyme and the resulting 
fragments analyzed on agarose gels to determine the size of the fragments. 

If the DNA molecule itself has the binding domains of interest, the subcloned 
DNA molecule (or excised fragment) can be assayed for binding to the relevant ligand. 
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If the DNA molecule encodes a dual-domain ribozyme, then the ribozyme RNA 
can be transcribed from the vector. The coding sequence can be excised with restriction 
enzymes and contacted with an RNA polymerase (along with ribonucleotides and other 
required factors) to transcribe the dual-domain RNA. The ribozyme can then be 
5 quantified and its enzymatic activity measured in an appropriate assay. 

A DNA molecule encoding a dual-domain polypeptide is first expressed. If 
desired, the DNA can be additionally modified to include sequences that will permit or 
optimize expression in an appropriate host or in an in vitro transcription/translation 
system. Once expressed, the polypeptide is then subjected to appropriate functional 

10 assays, e.g., measurement of enzymatic activity (of either domain). Also the quantity 

and physical properties of the dual domain polypeptide can be determined, e.g., by SDS- 
PAGE. Electrophoretic separation can be followed by direct staining of protein or by 
Westem blotting and probing with an appropriate antibody that recognizes an epitope of 
either domain. If a domain has binding activity, or other fimctions as have been 

15 described above, this can also be measured by conventional means. 

Having now generally described the invention, the same will be more readily 
understood through reference to the following examples which are provided by way of 
illustration, and are not intended to be limiting of the present invention, unless specified. 

20 

The following examples are provided by way of illustration only and not by way 
of limitation. Those of skill will readily recognize a variety of noncritical parameters 
which could be changed or modified to yield essentially similar results. 

25 EXAMPLE 1 

Generation of a Self/Tumor Antigen from a Single Patient (C J) that Includes 

the Idiotype of CJ B Cell Lymphoma 

The immunogenic scFv protein designated "CJ" was derived from human 
lymphoma patient (having the initials CJ) and had as its linker (Gly4Ser)3. Patient CJ 
30 had been treated in an earlier passive immunotherapy trial The CJ molecule 
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(specifically, its V region epitope or epitopes) is recognized by an anti-Id mAb named 
7D11. See, also; McCormick, AA et aL, Proc Natl Acad Sci USA (1999) 96:703-708). 

In an initial attempt to make a human scFv polypeptide, CJ V region genes were 
sequenced and cloned into a bacterial expression system using a (Gly3Ser)4 linker. 
Although targeted to the periplasm with a PEL-b leader, CJ scFv protein was 
sequestered in insoluble inclusion bodies. When mice were immunized with CJ scFv 
made in bacteria, no anti-CJ anti-idiotype antibody responses were detected. 

Derivatives of CJ were generated by producing linkers having random length 
and sequence that was part of general PCR based cloning strategy described herein. 

Four reactions were carried out. In the first and second, the sequence encoding 
the Vh domain was amplified from a cDNA clone of the lymphoma cells from patient 
CJ using the following synthetic oligonucleotides: 

VhF: 5' gtg gca tac agg ttc aac tgg tgg agt ctg (SEQ ID N0:4) 
VhR: 5' (asy)x tga gga gac ggt gac cag ggt tc (SEQIDN0:5) 
The SphI restriction site is underscored. In the first reaction x was 6: 
asy asy asy asy asy asy tga gga gac ggt gac cag ggt tc (SEQ ID N0:6) 
In the second reaction, x was 9, giving SEQ ID N0:7: 

asy asy asy asy asy asy asy asy asy tga gga gac ggt gac cag ggt tc 

(In general, the number of triplets (x) can be 1 to about 50) 

In the third and fourth PCR reactions, the sequence encoding the Vl domain was 

amplified from a cDNA clone of CJ using the following synthetic oUgonucleotides: 

VlF: 5' (rst)^ gac att cag atg acc cag tct cct tc (SEQIDN0:8 

VlR: 5' cac cct agcr eta teg ttt gat cag tac ctt ggt ccc ctg 

(SEQ ID N0:9) 

The Avrll site is imderscored. In the third reaction z was 6: 

rst rst rst rst rst rst gac att cag atg acc cag tct cct tc (SEQIDNO:10) 
In the fourth reaction, z was 9 (SEQ ID NO: 1 1): 

rst rst rst rst rst rst rst rst rst gac att cag atg acc cag tct cct tc 
(In general, the number of triplets (z) can be 1 to about 50.) 
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Following amplification, the four PGR products were purified and digested with 
SphI for the Vh chain PGR product and Avrll for the Vl chain PGR product. The 
digests were electrophoresed on an agarose gels and the four digested PGR fragments 
were purified, combined and ligated into a Geneware® expression vector pBSG1250 
(pTTOSA derivative) containing a hybrid fusion of TMV and ToMV (Kumagai, et al, 
supra) that had been digested with the restriction enzymes SphI and AvrIL In the 
particular Geneware® vector, the SphI site hes downstream of the TMV Ul GP 
subgenomic promoter and the a amylase signal peptide sequence. The SphI site in the 
primer VrF is in-frame with the SphI site in the a amylase signal peptide sequence. 
After Hgation of both the Vh and Vl PGR fragments into the Geneware® vector, the 
DNA was treated with polynucleotide kinase and ATP to incorporate phosphates at the 
blunt 5' ends of the initial PGR products. 

Following the kinase reaction, the DNA was ligated back upon itself, to generate 
circular plasmids. The ligated DNA was transformed into E, coli (using electropor- 
ation), and the transformed cells were plated on selective media containing 50 [xg/ml 
ampicillin. Plasmid DNA was purified from individual ampicillin-resistant E. coli 
colonies and transcribed with T7 RNA polymerase to generate infectious transcripts of 
individual clones. 

Transcripts were transfected into A^. tobacum plant protoplasts using a 
PEG-based transfection protocol essentially as described in Lindbo et al. Plant Cell 
5:1749-1759 (1993), and transfected protoplasts were incubated in protoplast culture 
medium for several days. The latter medium contained 265 mM mannitol, IX 
Murashige minimal organics medium (Gibco/BRL), 1.5 mM KH2PO4, 0.2 pg/ml 
2,4-dichlorophenoxyacetic acid, 0.1 jag/ml kinetin, and 5% coconut water (Sigma). 
Protoplasts were cultured at a density of about 10^ cells/ml. Plasmid DNA was purified 
from at least 10 to 50 individual colonies from each cloning experiment. 

Approximately 1-4 days after transfection, protein samples were collected from 
the individual protoplast samples. Gulture medium (200-500 was concentrated about 
1 0-fold by speed vacuum evaporation or Microcon sample concentrator. 
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Since this cloning strategy included a signal peptide sequence designed to 
promote secretion of the protein product by the plant cells into the culture medium, 
medium samples were also analyzed by SDS-PAGE followed by Coomassie blue 
staining and/or by Western blotting. 
5 The starting scFv incorporated the standard (Gly4-Ser)3 linker sequence; the 

other scFv chains were randomly selected from the transformants obtained from the 
hnker library cloning experiment that utilized the cloned PGR products generated from 
the four primers (SEQ ID N0:4-l 1, above). Culture supematants from equivalent 
numbers of cells were electrophoresed (SDS-PAGE), and the gels were transferred to 

1 0 nitrocellulose membranes for Western analysis with mAb 7D1 1 (see above). 

Some selected linker library members that were screened randomly appeared to 
express and accumulate as much or more C J protein as did the CJ scFv having the 
conventional linker (Gly4-Ser)3. 

DNA of those library members expressing particularly high amounts of CJ scFv 

15 was sequenced. Results are shown in Table 2. Plasmid DNAs for select clones were 
prepared and sequenced by standard methods. From the nucleotide sequences of the 
various CJ-derived constructs, the linker sequence of individual clones was deduced. 
Table 2 lists some of the nucleotide and amino acid linker sequences obtained and 
indicates "relative expression" which means the amount of expression relative to the 

20 same protein but with the (Gly4Ser)3 linker. 
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Table 2 

Analysis of select members of the CJ linker library experiment in plant protoplasts 



Clone 


Linker Region Nucleotide Sequence (lower case) and 

Amino Acid Sequence (upper case) 


SEQ 
ID 
NO: 


Length 
(aa) 


RE* 


#24 


actactgctactggtgctagtactactgctggtgctagt 
TTATGASTTAGAS 


12 
13 


13 aa 


++ 


#36 


Gctactgctgctagtggtgctgctgctggtggtggtact 
ATAASGAAAGGGT 


14 
15 


13 aa 


+ 


#37 


Gctactggtgctagtactagtgctactgctggtggtagt 
ATGASTSATAGGS 


16 
17 


13 aa 


-f+ 


#20 


Agtactgctgctggtactagtagtggtagtagtactggt 
STAAGTSSGSSTG 


18 
19 


13 aa 


++ 


#12 


Gctagtactgctactagtagtggtggtggtggtactggtagtagtgctgct 
ASTATSSGGGTGSSAAA 


20 
21 


17 aa 




#16 


Gctactagtactgctgctgctggtgctactagtgctactggtggtgctagtggtactggt 
ATSTAAAGATSATGGASGTG 


22 
23 


20 aa 


H 1 h 


#30 


Actggtgctagtggtgctactagtagtggtagtagtagt 
TGASGATSSGSSS 


24 
25 


13 aa 


-h++ 



RE = Relative Expression to the (Gly4Ser)3 clone 



DNA sequencing revealed that the clones did not have the same nucleotide or 
5 amino acid sequences but rather, demonstrated amino acid and nucleotide length 

diversity. Table 2 shows a sampling of clones with L's ranging from 13 to 20 amino 
acids. This range was apparently a result of mispriming during PGR ampUfication of 
the Vh and Vl coding sequences. Since the linker coding sequences of the 
ohgonucleotides used in this experiment contain stretches of low complexity nucleotide 
10 sequences (/.e., asy^ or rst^ and), multiple mispriming events are likely. In 

conjimction with DNA polymerase/exonuclease activities present during PGR, this 
could lead to an increase or a decrease in the number of codons comprising the L 
sequences. 

The quantities of GJ scFv protein produced also varied (relative to the GJ scFv 
15 with the (Gly4Ser)3 linker). This indicates that both the length and the sequence of the 
linker region affects the amount of protein produced by the plant cells or plants. 
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EXAMPLE 2 
Expression of scFv Product in Whole Plants 



The process described in Example 1 is repeated except that whole plants are used 
along with a suitable expression system for producing the scFv products. 

Expressed products are screened by SDS-PAGE/Coomassie blue staining and/or 
Western blotting. The results indicate a varied amount of scFv product produced. The 
highest yielding clones are selected for production of the vaccine scFv. 
Expression svstem 

The DNA fragments encoding the dual-domain scFv fragments having the V 
regions of the CJ human lymphoma were generated as in Example 1 and cloned into 
vector pBSG1250. In this vector, a TMV coat protein subgenomic promoter is located 
upstream of the insertion site of the CJ sequence. Following infection, this TMV coat 
protein subgenomic promoter directs initiation of the CJ RNA synthesis in plant cells at 
the transcription start point ("tsp"). The rice a amylase signal peptide (O'Neill, SD et 
al (1990) MoL Gen, Genet, 221:235-244), fused in- frame to the CJ sequence, encodes a 
31 residue polypeptide which targets proteins to the secretory pathway (Firek, S. et al. 
(1994) Transgenic Res, 3:326-331), and is subsequently cleaved off between the C- 
terminal Gly of the signal peptide and the N-terminal Met of the expressed CJ scFv 
protein. The sequence encoding CJ scFv has been introduced between the 3 OK 
movement protein and the ToMV coat protein (Tcp) genes. An T7 phage promoter has 
been introduced upstream of the viral cDNA, allowing for transcription of infective 
genomic plus-strand RNA. 

Capped infectious RNA was made in vitro from 1 jig plasmid, using a T7 
message kit from Ambion. Synthesis of the message was quantified by gel 
electrophoresis and approximately 2 |ag of the in vitro transcribed viral RNA was 
applied with an abrasive to the lower leaves (approximately 1 -2 cm in size) of A^. 
benthamiana (Dawson, WO et aL (19S6) Proc. Natl Acad. Sci. USA 83:1832-1836). 
Transcription of subgenomic RNA encoding the CJ scFv protein was initiated after 
infection at the indicated transcription start point. High levels of subgenomic RNA 
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species were synthesized in virus-infected plant cells (Kumagai, MH, et al (1993) Proc. 
Natl Acad. Set USA 90:427-430), and serve as templates for the translation and 
subsequent accumulation of C J scFv protein. 
Characterization of clones 
5 Signs of infection were visible after 5-6 days as mild leaf deformation, with 

some variable leaf mottling and growth retardation. Eleven to fourteen days post 
inoculation, the secreted proteins were isolated. Leaf and stem material was harvested, 
weighed and then subjected to a 700 mm Hg vacuum for 2 min in infiltration buffer 
(lOOmM Tris HCl, pH 7.5 and 2niM EDTA). Secreted proteins (hereafter termed 

1 0 "interstitial fraction" or "IF") were recovered from infiltrated leaves by mild 

centrifugation at 2000g (Beckman JA-14) on supported nylon mesh discs, concentrated 
approximately 10-fold in Centricon-10 (Amicon) concentrators. Total protein was 
measured by the Bradford method (Bradford, M. (1976) Anal, Biochem, 72:248-254) 
and stored at -80°C imtil used. 

15 The secreted material was analyzed for the presence of soluble CJ scFv protein 

by the SDS-PAGE followed by Western blot with CJ mAb 7D 1 1 . About 3 jag of IF 
protein were separated by SDS-PAGE and transferred to nitrocellulose membrane in 
standard Tris-glycine buffer with 20% methanol at 150V for 1 hour. After transfer, 
blots were treated for 20 minutes at room temperature with blocking buffer (50 mM Tris 

20 pH 8, 150mM NaCl, ImM EDTA, 2.5% non-fat dry milk, 2.5% BSA and 0.05% Tween 
20) followed by a 1 6 hr incubation at 4^C in blocking buffer plus 1 |ig/ml purified 7D1 1 
antibody. After three 15 minute washes (100 mM Tris pH 8, 150 mM NaCl, 1 mM 
EDTA and 0.1%) Tween 20), membranes were incubated for 1 hour in blocking buffer 
plus 1 |ig/ml goat anti-mouse IgG-HRP (Southem Biotechnology), After three 15 

25 minute washes, Western blots were developed by Enhanced Chemiluminescence (ECL) 

(Amersham) according to manufacturers instructions. Exposure times ranged from 1 to 
5 seconds. No cross reactivity to plant proteins was observed (testing IF extracts from 
control infected plants). 

Individual clones were sequenced, analyzed for reading frame and amino acid 

30 identity to the original CJ Ig sequence and then screened for protein expression in 
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infected plants. Figure 1 shows the results of 9 individual CJ scFv expressing clones 
that demonstrated various levels of protein accumulation. Clones 20 and 30 showed 
high levels of expression, as well as accumulation of protein dimers. Clone C contained 
a modification of the (Gly3Ser)4 linker. 

From the sequence data, the Hnker sequences for individual clones were 
deduced. The clone numbers in Table 3 are the same as those listed in Table 2. As 
above, relative expression relates to the scFv protein having (Gly4Ser)3 linker. 

As above, differences were observed in the expression of various CJ scFv-based 
clones in whole plants. Interestingly, some clones that were expressed in plant 
protoplasts were not expressed in whole plants. For example, clone #16 which was 
strongly expressed in plant protoplasts was apparently not expressed in whole plants. 
Nevertheless, the methods disclosed for generating the linker regions with varying 
length and sequence permit the screening of large numbers of clones for their expression 
in either plant protoplast or whole plants. 



Table 3. 

Analysis of select members of the CJ linker librarv experiment in whole plants 



Clone 


Linker Region Nucleotide Sequence (lower case) and 
Amino Acid Sequence (upper case) 


SEQ 
ID 
NO: 


Length 


RE 


#24 


actactgctactggtgctagtactactgctggtgctagt 
TTATGASTTAGAS 


12 
13 


13aa 




#36 


gctactgctgctagtggtgctgctgctggtggtggtact 
ATAASGAAAGGGT 


14 
15 


13aa 


+ 


#37 


gctactggtgctagtactagtgctactgctggtggtagt 

ATGASTSATAGGS 


16 
17 


13aa 


++ 


#20 


Agtactgctgctggtactagtagtggtagtagtactggt 
STAAGTSSGSSTG 


18 
19 


13aa 


++ 


#12 


Gctagtactgctactagtagtggtggtggtggtactggtagtagtgctgct 
ASTATSSGGGTGSSAAA 


20 
21 


17aa 


+ 


#30 


Actggtgctagtggtgctactagtagtggtagtagtagt 
TGASGATSSGSSS 


24 
25 


13aa 


+++ 



* RE = Relative Expression to the (Gly4Ser)3 clone 
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The quality of CJ protein, optimized by the random Unker Ubrary, was validated 
by two methods. First, CJ protein was purified by affinity chromatography using 
immobihzed 7D1 1 anti-idiotype mAb. This method requires that the CJ protein bind to 
the anti-Id column under physiological conditions. Such binding will not occur if the 
protein is not folded correctly. Protein was bound under normal pH and was eluted by 
50mM diethylamine pH 1 1.5, then immediately dialyzed against normal saline. 
Material was quantitated by ELISA using 7D1 land using standard protein 
determination. 

The second, more stringent, assay for the quality of the CJ protein was a 
functional assay in animals. Clone CJLL20 (for Linker Library pick #20) was purified 
by 7D1 1 affinity chromatography, administered to five mice in 3 bi-weekly 
immunizations of SOjig each. Ten days after the third injection, serum was sampled. 
Using the native idiotype (ID 12), or an isotype-matched irrelevant human antibody in a 
sandwich ELISA, the sera were tested for specific responses to the CJ idiotype. Results 
are shown in Figure 2. 

Non-specifically antibody responses to xenogeneic human Ig determinants were 
present in only 3 of the 5 animals and in very low amounts (detected as minimal cross- 
reactivity of the murine sera to an unrelated human antibody). 

The sera of all 5 mice had high titers of anti-CJ antibodies (Figure 2). Thus, the 
immune response induced by the dual-domain scFv polypeptide was highly specific for 
the original Vh and Vl domains of the original Ig, as predicted and as desired. These 
results suggested that the protein produced in plants was folded correctly so that it could 
induce an appropriate immune response when administered to subjects. 

EXAMPLE 3 
Expression of scFv Product in Whole Plants 

The process described in Example 2 was repeated except that a different human 
scFv with unknown expression characteristics was used along with a suitable expression 
system for producing the scFv products. 
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Expressed products were screened by SDS-PAGE/Coomassie blue staining. The 
results indicated that the amount of scFv product produced varied based on linker 
composition. The highest yielding clones are selected for production of a vaccine scFv. 

Expression system 

The DNA fragments encoding the dual-domain scFv fragments having the V 
regions of the Go 19 human lymphoma were generated as in Example 1 and cloned into 
pl324-MBP, a modified 30B vector (Shivprasad, S. etal (1999) Virology 255:312- 
323), containing a hybrid fusion of TMV and TMGMV-U5 as well as the rice a amylase 
signal peptide with Sph I and Avr II insert cloning sites. 

In this vector, a TMV coat protein subgenomic promoter is located upstream of 
the insertion site of the Go 19 sequence. Following infection, this TMV coat protein 
subgenomic promoter directs initiation of Go 19 RNA synthesis in plant cells at the 
transcription start point ("tsp"). The rice a amylase signal peptide (O'Neill, SD et al 
(1990) Mol. Gem Genet 221:235-244), fused in-frame to the Gol9 sequence, encodes a 
31 residue polypeptide which targets proteins to the secretory pathway (Firek, S. et al 
(1994) Transgenic Res. 3:326-33 1), and is subsequently cleaved off between the 
C-terminal Gly of the signal peptide and the N-terminal Met of the expressed Go 19 scFv 
protein. The sequence encoding Go 19 scFv was been introduced between the 3 OK 
movement protein and the TMGMV-U5 coat protein (Tcp) genes. A T7 phage RNA 
polymerase promoter was introduced upstream of the viral cDNA, allowing for 
transcription of infective genomic plus-strand RNA. 

The Gol9 V regions were amplified in four separate PGR reactions. In the first 
and second reactions, the sequence encoding the Vh domain was amplified from a 
cDNA clone derived from the lymphoma cells of patient Go 19 using the following 
synthetic oligonucleotides: 

VhF: 5' cct gca tgc tgg agg tgc agt tgg tgg aat c (SEQ ID NO:26 

VhR: 5' (asy) X aga gga gac ggt gac cat ga (SEQ ID NO:27 

The SphI restriction site is underscored above. In the first reaction x was 4: 

5'-asy asy asy asy aga gga gac ggt gac cat ga (SEQ ID NO:28) 
In the second reaction, x was 9 (SEQ ID NO:29): 
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5'-asy asy asy asy asy asy asy asy asy aga gga gac ggt gac cat ga 
(In general, the number of triplets (x) can be 1 to about 50) 

In the third and fourth PGR reactions, the sequence encoding the Vl domain was 
amplified from a cDNA clone of Go 19 using the following synthetic oligonucleotides: 
VlF: 5' ( rs t) 2 cag tct gcc ctg act cag t (SEQ ID NO:30) 

VlR: 5' cac cct agg tea acc aag gac ggt cag gtt ggt c (SEQ ID N0:3 1) 

The Avr II restriction site is underscored above. In the first reaction, z was 6: 

5'-rst rst rst rst rst rst cag tct gcc ctg act cag t (SEQIDNO:32) 
In the second reaction, z was 9, giving SEQ ID NO: 33: 

5'-rst rst rst rst rst rst rst rst rst cag tct gcc ctg act cag t 
(In general, the number of triplets (z) can be 1 to about 50) 

Prior to PGR amplification, the VrR and VlR oligonucleotides were treated with 
polynucleotide kinase and ATP to add phosphates at the 5' end of the oligonucleotides. 
Following amplification, the four PGR products are purified and the Vh and Vl products 
are ligated together to create the scFv. The scFv ligation products are re-purified, 
restriction digested with SphI and Avr II and the digested scFv is gel isolated and ligated 
into the Geneware® vector. The ligated DNA was transformed into E. coli (using 
electroporation), and the transformed cells were plated on selective media containing 50 
)ig/ml ampicillin. Plasmid DNA was purified from individual ampicillin-resistant E. 
coli colonies. 

Gapped infectious RNA was made in vitro from approximately 0.5 jug plasmid, 
using an T7 message kit from Ambion. Synthesis of the message was evaluated by gel 
electrophoresis, and approximately 2 ng of the in vitro transcribed viral RNA was 
encapsidated with purified TMV-Ul coat protein in lOOmM sodium phosphate, pH 7.0 
at room temperature for a minimum of 6 hours. Encapsidated transcripts are applied 
with an abrasive to the lower leaves (approximately 1-2 cm in size) of N, benthamiana 
(W.O. Dawson et al (1986) Proc. Natl Acad Set USA S5:1832-1836). Transcription of 
subgenomic RNA encoding the Go 19 scFv protein was initiated after infection at the 
indicated transcription start point. High levels of subgenomic RNA species were 
synthesized in virus-infected plant cells (M.H. Kumagai et al. (1993) Proc, Natl Acad. 
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Sci. USA 90:427-430), and serve as templates for the translation and subsequent 
accumulation of Go 19 scFv protein. 

Characterization of clones 

Signs of infection were visible after 5-6 days as mild leaf deformation, vv^ith 
some variable leaf mottling and growth retardation. Eleven to fourteen days post 
inoculation, the secreted proteins were isolated. Approximately 0.1 g of infected leaf 
material was harvested, placed into 96-well glass fiber filtration block 
(Whatman/Polyfiltronics), submerged in infiltration buffer (20mM Tris HCl, pH 7.0 , 
lOmM 2-mercaptoethanol). The tissue is subjected to a 700 mm Hg vacuxim for 30 
seconds, the vacuum released and the vacuum process is repeated at least one addition 
round. Residual buffer is removed by a low speed spin at 30 x g in a plate centrifuge. 
Secreted proteins (hereafter termed "interstitial fraction" or "IF") were recovered from 
infiltrated leaves by mild centrifugation at 1700 x g in a plate centrifuge and collected 
into a 96 well polypropylene plate. 

The secreted material was analyzed for the presence of soluble Go 19 scFv 
protein by SDS-PAGE. IF (27 |il containing approximately 5 |ig of protein) was 
separated by SDS-PAGE. Linkers from individual clones were sequenced, analyzed for 
reading frame and amino acid content and then screened for protein expression in 
infected plants. Figure 3 shows the results of 22 individual Go 19 scFv expressing 
clones that demonstrated various levels of protein accumulation. Clones C5 and El and 
E9 showed high levels of expression with minimal protease degradation. 

From the sequence data, the linker sequences for individual clones were deduced 
as shown in Table 4. 
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Table 4. 



Analysis of select members of the Gol9 linker library experiment in whole plants 



Clone 


Linker Region Nucleotide Sequence (lower case) and 


SEQID 
NO- 


Length 


RE* 


#C5 


Ggtgctggtggtggt 
G A G G G 


34 
35 


5 




#C10 


Actggtggtggtggtggtagtggtggtggt 
TGGGGGSGGG 


36 
37 


10 


*** 


#C11 


Actactactactgctactactgctggtagtggtgct 

TTTTATTAGSGA 


38 
39 


12 


** 


#E1 


Gctagtactggtgct 
A S T G A 


40 
41 


5 




#E9 


Agtactggtagtagtggtgctggt 
STGSSGAG 


42 
43 


8 




#E3 


Gctagtagtggtgctagtgct 
A S S G A S A 


44 
45 


7 


* 


#C4 


Gctagtggtggtactgctggtactggtggtagtagtact 
ASGGTAGTGGSST 


46 
47 


13 




#E4 


Actagtggtagtggtgctagtgctgctgctggtggtgct 
TSGSGASAAAGGA 
gctgctagtgct 
A A S A 


48 
49 


17 





i?^ = Relative Expression to Gol9 scFv library clones 



As above, differences were observed in the expression of various Go 19 scFv-based 
clones in whole plants as well as the degree of degradation indicated by the presence of 
protein accumulation between the 6.5 kDa and 21 kDa marker bands. The methods 
disclosed for generating the linker regions with varying length and sequence permit the 
screening of large numbers of clones for their expression in either plant protoplast or 
whole plants. 



EXAMPLE 4 
scFv-Detectably Labeled Conjugates 

A mAb to HER-2/neu inhibits growth of cells of the breast cancer cell hne 
SK-Br-3 (ATCC HTB 30) in 6 day culture. Such treatment sensitizes these cells to 
chemo therapeutic agents (US 5,677,171). 
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The process of Example 1 is repeated using a Vh and Vl regions of an scFv that 
specifically binds the HER-2/neu (erbB-2) protein. The scFv gene encoding such a 
polypeptide is described in Wels et al, Biotechnology 70:1 128-1 132 (1992). Using the 
same repeated triplet nucleotide sequences as in Example 1, the 3' end of the erbB-2 
scFv DNA construct is linked to the 5' end of the horseradish peroxidase gene using 
appropriate PGR primers modeling the method in Example 1 . 

High yielding clones are identified by measuring for peroxidase activity in the 
supernatant. High affinity and avidity re determined by immunohistochemical 
detection, with substrate and chromophore on control samples of a breast cancer cell 
line that overexpresses HER-2/neu. Comparisons are made to conventional labeled 
mAbs to HER-2/neu (such as DAKO HercepTest, Dako Corp., Carpinteria, CA) to 
determine which clones produce acceptable scFv proteins. 

EXAMPLE 5 
scFv-Toxin Conjugate Production 

The process of Example 4 is repeated, with the following modification. The 
gene for the ricin A chain is linked to the 3' end of the scFv DNA construct through the 
linker region of this invention (made up of repeated triplet nucleotides). 

The plant cell clones are grown in 24 well plates and screened initially by 
measuring secreted protein (PAGE followed by Coomassie blue staining). Two day 
culture supematants fi*om the wells in which each clone is growing are tested for 
cytotoxic activity toward target cells by incubation with active cultures of SK-Br-3 in 
six well plates (Costar). Cytotoxicity against these targets is determined 48 hours later 
by microscopic inspection. 

High producing clones that generate strong cytotoxicity are selected. Calluses 
are formed firom these cultures to regenerate plants for field growth and large scale 
production. 

Humanized mAb to HER-2/neu is an FDA approved therapeutic for breast 
cancer (HERCEPTIN, Genentech, Inc., South San Francisco, CA). It is expected that 
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toxin-conjugated scFv specific for the same antigen will be at least equally and probably 
more cytotoxic to human breast cancer cells. 

EXAMPLE 6 
Production of Dual-Domam Ribozvmes 

The process of Example 1 is repeated except that DNA encoding two different 
ribozyme domains is used. The vector that contains the subcloned dual ribozyme 
domains is transcribed to produce RNA with the properties of the respective ribozyme 
domains. 

The amount of transcribed RNA product can be determined by hybridization 
with an oligonucleotide probe, by spectrophotometric measurements, etc. The amount 
of activity of either ribozyme domain can be measured using the appropriate assay. 

EXAMPLE 6 

Prodttction of Dual DNA Domains 

The process of Example 1 is repeated except that two different DNA are used, 
each of which binds a protein. The plasmid DNA can be produced in large amounts, 
and the dual DNA domain molecule can be excised with a restriction endonuclease. The 
resulting fragment has the two linked DNA domains and can be assayed for its ability to 
bind to a DNA binding protein {e.g.,, transcription factor, restriction endonuclease, 
polymerase, etc. 

The references cited above are all incorporated by reference herein, whether 
specifically incorporated or not. 

Having now fiiUy described this invention, it will be appreciated by those skilled 
in the art that the same can be performed within a wide range of equivalent parameters, 
concentrations, and conditions without departing from the spirit and scope of the 
invention and without undue experimentation. 
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J 



1 



While this invention has been described in connection with specific 
embodiments thereof^ it will be understood that it is capable of further modifications. 
This application is intended to cover any variations, uses, or adaptations of the invention 
following, in general, the principles of the invention and including such departures from 
5 the present disclosure as come within known or customary practice within the art to 

which the invention pertains and as may be applied to the essential features hereinbefore 
set forth as follows in the scope of the appended claims. 
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SEQUENCE LISTING 

<110> REINL, Stephen 
LINDBO, John 
TURPEN, Thomas 

<12 0> CREATION OF VARIABLE LENGTH AND SEQUENCE LINKER REGIONS FOR DUAL- 
DOMAIN OR MULTI -DOMAIN MOLECULES 

<130> LSB-006 

<14 0> Unas signed 

<141> 2000-09-22 

<150> US 60/155,978 

<151> 1999-09-24 

<160> 51 

<170> Patentin version 3.0 

<210> 1 

<211> 9 

<212> PRT 

<213> Artificial/Unknown 
<220> 

<221> misc_feature 

<222> {)..() 

<223> linker 

<400> 1 

Pro Gly He Ser Gly Gly Gly Gly Gly 
1 5 

<210> 2 

<211> 16 

<212> PRT 

<213> Artificial/Unknown 
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<220> 

<221> misc_feature 
<222> ()..() 
<223> linker 
<400> 2 

Asn Asn Asn Asn Asn Asn Asn Asn Asn Asn Leu Gly lie Glu Gly Arg 
15 10 15 



<210> 3 

<211> 15 

<212> PRT 

<213> Artificial/Unknown 
<220> 

<221> misc_feature 

<222> ()..() 

<223> linker 

<400> 3 

Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser 
15 10 15 

<210> 4 

<211> 30 

<212> DNA 

<213> Artificial/Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<223> primer 

<400> 4 

gtggcatgca ggttcaactg gtggagtctg 3 0 



DC043399 



Dkt: LSB-006 



<210> 5 

<211> 26 

<212> DNA 

<213> Artificial/Unknown 
<220> 

<221> misc_feature 

<222> (1) . . (3) 

<223> "asy" can appear from 1 to 50 times before the remainder of the 
sequence 

<220> 

<221> misc_feature 

<222> {)..() 

<223> primer 

<400> 5 

asytgaggag acggtgacca gggttc 2 6 

<210> 6 

<211> 41 

<212> DNA 

<213> Artificial/Unknown 
<220> 

<221> misc_feature 

<222> {)..() 

<223> primer 

<400> 6 

asyasyasya syasyasytg aggagacggt gaccagggtt c 41 

<210> 7 

<211> 50 
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<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> misc_feature 

<222> {)..() 

<223> primer 

<400> 7 

asyasyasya syasyasyas yasyasytga ggagacggtg accagggttc 50 

<210> 8 

<211> 29 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> misc_feature 

<222> (1) . . (3) 

<22 3> "rst" can appear from 1 to 5 0 times before the remainder of the 
sequence 

<220> 

<221> misc_feature 

<222> ()..{) 

<223> primer 

<400> 8 

rstgacattc agatgaccca gtctccttc 29 

<210> 9 

<211> 39 

<212> DNA 

<213> Artificial /Unknown 
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<220> 

<221> ■misc_feature 

<222> ()..() 

<223> primer 

<400> 9 

caccctaggc tatcgtttga tcagtacctt ggtcccctg 3 9 

<210> 10 

<211> 44 

<212> DKTA 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> {)..() 

<223> primer 

<400> 10 

rstrstrstr strstrstga cattcagatg acccagtctc cttc 44 

<210> 11 

<211> 53 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> Tnisc_f eature 

<222> ()..() 

<223> primer 

<400> 11 

rstrstrstr strstrstrs trstrstgac attcagatga cccagtctcc ttc 53 
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<210> 12 

<211> 38 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<22 3> linker region nucleotide sequence 

<400> 12 

atactgctac tggtgctagt actactgctg gtgctagt 38 

<210> 13 

<211> 13 

<212> PRT 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> {)..() 

<223> linker region amino acid sequence 

<400> 13 

Thr Thr Ala Thr Gly Ala Ser Thr Thr Ala Gly Ala Ser 
15 10 

<210> 14 

<211> 39 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> misc feature 
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<222> 



()..() 



<223> 



linker region nucleotide sequence 



<400> 



14 



gctactgctg ctagtggtgc tgctgctggt ggtggtact 



39 



<210> 15 

<211> 13 

<212> PRT 

<213> Artificial/Unknown 
<220> 

<221> misc_feature 

<222> ()..() 

<22 3> linker region amino acid sequence 

<400> 15 

Ala Thr Ala Ala Ser Gly Ala Ala Ala Gly Gly Gly Thr 
15 10 

<210> le 

<211> 39 

<212> DNA 

<213 > Artificial/Unknown 
<220> 

<221> misc_feature 

<222> ()..() 

<223> linker region nucleotide sequence 

<400> 16 

gctactggtg ctagtactag tgctactgct ggtggtagt 3 9 
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<210> 17 

<211> 13 

<212> PRT 

<213> Artificial /Unknown 
<220> 

<221> misc__f eature 

<222> ()..{) 

<22 3> linker region amino acid sequence 

<400> 17 

Ala Thr Gly Ala Ser Thr Ser Ala Thr Ala Gly Gly Ser 
15 10 

<210> 18 

<211> 39 

<212> DNA 

<213> Artificial/Unknown 
<220> 

<221> misc_f eature 

<222> {)..() 

<223> linker region nucleotide sequence 

<400> 18 

agtactgctg ctggtactag tagtggtagt agtactggt 3 9 

<210> 19 

<211> 13 

<212> PRT 



DC043399 



Dkt: LSB-006 



<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<223> linker region amino acid sequence 

<400> 19 

Ser Thr Ala Ala Gly Thr Ser Ser Gly Ser Ser Thr Gly 
15 10 

<210> 20 

<211> 51 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<223> linker region nucleotide sequence 

<400> 20 

gctagtactg ctactagtag tggtggtggt ggtactggta gtagtgctgc 

<210> 21 

<211> 17 

<212> PRT 

<213> Artificial/Unknown 
<220> 

< 2 2 1 > mis c_f eature 

<222> ()..() 

<223> linker region amino acid sequence 

<400> 21 
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Ala Ser Thr Ala Thr Ser Ser Gly Gly Gly Thr Gly Ser Ser Ala Ala 
15 10 15 

Ala 

<210> 22 

<211> 60 

<212> DNA 

<213> Artificial/ Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<2 23> linker region nucleotide sequence 

<400> 22 

gctactagta ctgctgctgc tggtgctact agtgctactg gtggtgctag tggtactggt 600 

<210> 23 

<211> 20 

<212> PRT 

<213> Artificial/Unknown 
<220> 

<221> misc_feature 

<222> ()..() 

<223> linker region amino acid sequence 

<400> 23 

Ala Thr Ser Thr Ala Ala Ala Gly Ala Thr Ser Ala Thr Gly Gly Ala 
15 10 15 

Ser Gly Thr Gly 

20 

<210> 24 
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t 



<211> 39 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> misc_feature 

<222> ()..() 

<223> linker region nucleotide sequence 

<400> 24 

actggtgcta gtggtgctac tagtagtggt agtagtagt 3 9 

<210> 25 

<211> 13 

<212> PRT 

<213> Artificial/Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<22 3> linker region amino acid sequence 

<400> 25 

Thr Gly Ala Ser Gly Ala Thr Ser Ser Gly Ser Ser Ser 
15 10 

<210> 26 

<211> 31 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 
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<223> primer 

<400> 26 

cctgcatgct ggaggtgcag ttggtggaat c 31 

<210> 27 

<211> 23 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> (1) . . (3) 

<223> "asy" can appear from 1 to 50 times before the remainder of the 
sequence 

<220> 

<221> misc_feature 

<222> ()..() 

<223> primer 

<400> 27 

asyagaggag acggtgacca tga 23 

<210> 28 

<211> 32 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> mi sc_f eature 

<222> ()..() 

<223> primer 
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<400> 28 

asyasyasya syagaggaga cggtgaccat ga 32 

<210> 29 

<211> 47 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<223> primer 

<400> 29 

asyasyasya syasyasyas yasyasyaga ggagacggtg accatga 47 

<210> 30 

<211> 22 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> (1) . . (3) 

<22 3> "rst" can appear from 1 to 50 times before the remainder of the 
sequence 

<220> 

<221> misc_feature 

<222> ()..() 

<223> primer 

<400> 30 

rstcagtctg ccctgactca gt 22 
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<210> 31 

<211> 34 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<223> primer 

<400> 31 

caccctaggt caaccaagga cggtcaggtt ggtc 34 

<210> 32 

<211> 37 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> (),,() 

<22 3> primer 

<400> 32 

rstrstrstr strstrstca gtctgccctg actcagt 37 

<210> 33 

<211> 46 

<212> DNA 

<213> Artificial/Unknown 
<220> 
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<221> 



misc feature 



<222> 



()..() 



<223> 



primer 



<400> 



33 



rstrstrstr strstrstrs trstrstcag tctgccctga ctcagt 



46 



<210> 34 

<211> 15 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> misc__f eature 

<222> ()..() 

<22 3> linker region nucleotide sequence 

<400> 34 

ggtgctggtg gtggt 15 

<210> 35 

<211> 5 

<212> PRT 

<213> Artificial/Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<223> linker region amino acid sequence 

<400> 35 

Gly Ala Gly Gly Gly 
1 5 
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<210> 36 

<211> 30 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> misc__f eature 

<222> {)..() 

<22 3> linker region nucleotide sequence 

<400> 36 

actggtggtg gtggtggtag tggtggtggt 

<210> 37 

<211> 10 

<212> PRT 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<223> linker region amino acid sequence 

<400> 37 

Thr Gly Gly Gly Gly Gly Ser Gly Gly Gly 
15 10 

<210> 38 

<211> 36 

<212> DNA 

<213> Artificial/Unknown 
<220> 

<221> misc feature 
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<222> 



()..() 



<223> 



linker region nucleotide sequence 



<400> 



38 



actactacta ctgctactac tgctggtagt ggtgct 



36 



<210> 39 

<211> 12 

<212> PRT 

<213> Artificial/Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<223> linker region amino acid sequence 

<400> 39 

Thr Thr Thr Thr Ala Thr Thr Ala Gly Ser Gly Ala 
15 10 

<210> 40 

<211> 15 

<212> DNA 

<213> Artificial/Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<223> linker region nucleotide sequence 

<400> 40 

gctagtactg gtgct 15 

<210> 41 

<211> 5 
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<212> PRT 

<213> Artificial/Unknown 
<220> 

<221> misc_feature 

<222> ()..() 

<22 3> linker region amino acid sequence 

<400> 41 

Ala Ser Thr Gly Ala 
1 5 

<210> 42 

<211> 24 

<212> DNA 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<223> linker region nucleotide sequence 

<400> 42 

agtactggta gtagtggtgc tggt 24 

<210> 43 

<211> 8 

<212> PRT 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> ()..{) 

<223> linker region amino acid sequence 
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<400> 



43 



Ser Thr Gly Ser Ser Gly Ala Gly 
1 5 



<210> 44 

<211> 21 

<212> DMA 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> {)..() 

<223> linker region nucleotide sequence 

<400> 44 

gctagtagtg gtgctagtgc t 

<210> 45 

<211> 7 

<212> PRT 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<223> linker region amino acid sequence 

<400> 45 

Ala Ser Ser Gly Ala Ser Ala 
1 5 



<210> 



46 



<211> 



39 



<212> 



DMA 
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<213> Artificial/Unknown 
<220> 

<221> misc_feature 

<222> ()..() 

<223> linker region nucleotide sequence 

<400> 46 

gctagtggtg gtactgctgg tactggtggt agtagtact 39 

<210> 47 

<211> 13 

<212> PRT 

<213> Artificial /Unknown 
<220> 

<221> misc_feature 

<222> ()..() 

<223> linker region amino acid sequence 

<400> 47 

Ala Ser Gly Gly Thr Ala Gly Thr Gly Gly Ser Ser Thr 
15 10 

<210> 48 

<211> 51 

<212> DNA 

<213> Artificial/Unknown 
<22 0 > 

<221> misc_feature 

<222> ()..() 

<223> linker region nucleotide sequence 

<400> 48 
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actagtggta gtggtgctag tgctgctgct ggtggtgctg ctgctagtgc t 



51 



<210> 49 

<211> 17 

<212> PRT 

<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> ()..{) 

<22 3> linker region amino acid sequence 

<400> 49 

Thr Ser Gly Ser Gly Ala Ser Ala Ala Ala Gly Gly Ala Ala Ala Ser 
15 10 15 

Ala 

<210> 50 

<211> 24 

<212> DNA 

<213> Artificial/Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<223> primer 

<400> 50 

rstrstrstr strstrstca tgcc 24 

<210> 51 

<211> 24 

<212> DNA 
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<213> Artificial /Unknown 
<220> 

<221> misc_f eature 

<222> ()..() 

<223> primer 

<400> 51 

ggcatgasya syasyasyas yasy 
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WHAT IS CLAIMED IS: 



1. A library of dual-domain nucleic acid molecules each of which has 

(a) a first and a second domain; 

(b) separating and linking said domains, a linker which is a member of a 
randomized library of linkers that 

(i) vary in size and nucleotide sequence, 

(ii) consist of a repeated pattern of degenerate repeated triplet 
nucleotides. 

2. The library of molecules of claim 1, wherein said repeated pattern of 
degenerate repeated triplet nucleotides of said linkers having the following properties: 

(i) position 1 of each repeated triplet cannot be the same nucleotide as 
position 2 of the repeated triplet; or 

(ii) position 2 of each repeated triplet cannot be the same nucleotide as 
position 3 of the repeated triplet; or 

(iii) position 1 of each repeated triplet cannot be the same nucleotide as 
position 3 of the repeated triplet. 

3. The library of molecules of claim 2 wherein the nucleotide in the first 
and second positions of each repeated triplet is selected from any two of 
deoxyadenosine, deoxyguanosine, deoxycytidine or deoxythymidine. 

4. The library of molecules of claim 3, wherein 

(i) position 1 of each repeated triplet is deoxyadenosine or deoxyguanosine; 

(ii) position 2 of each repeated triplet is deoxycytidine or deoxyguanosine; and 

(iii) position 3 of each repeated triplet is deoxj^hymidine. 

5. The library of molecules of claim 1 wherein at least one of said domains 
binds to a protein, 

6. The library of molecules of claim 5 wherein both of said domains bind to 
a protein. 
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7. The library of molecules of claim 1 wherein at least one of said domains 
binds to a nucleic acid that is not a member of said library. 

8. The library of molecules of claim 7 wherein both of said domains bind to 
5 a nucleic acid that is not a member of said library. 

9. The library of molecules of any of claims 1-4 wherein said first and said 
second domains are coding sequences. 

The library of molecules of any of claims 1-8 produced in plant cells. 

The library of molecules of claim 9 produced in plant cells. 

A dual-domain nucleic acid molecule selected from the library of any of 

A dual-domain nucleic acid molecule selected from the library of claim 9. 

A dual-domain nucleic acid molecule selected from the library of claim 10. 

A dual-domain nucleic acid molecule selected from the library of claim 1 1 . 

16. A Ubrary of dual-domain polypeptide molecules each of which is 
described by the formula Di-L -D2 wherein 
25 (a) Di and D2 are polypeptide domains and 

(b) L is a peptide or polypeptide linker which is a member of a randomized 
library of linkers that vary in size and sequence, which library is encoded 
by nucleic acid sequences consisting of a repeated pattern of degenerate 
repeated triplet nucleotides. 

30 



10 10. 



20 



11. 



12. 

15 claims 1-8. 



13. 



14. 



15. 



DC043399 



Dkt: LSB-006 



17. A library of multi-domain polypeptide molecules each of which 
comprises polypeptide domains D each pair of which is linked by a peptide or 

polypeptide linker each molecule being described by the formula DxLy wherein 

X is an integer between 2 and 20, 

y is an integer between 1 and 19, with the proviso that for any value of x, y=x-l; 

Di is bonded to a single C-terminal linker; 

the C-terminal-most D is bonded to a single N-terminal linker; 

each of D2 to D19 are bonded to a N-terminal and a C-terminal linker; 

each L is a member of a randomized library of linkers that vary in size and 

sequence, 

said linker library being encoded by nucleic acid sequences consisting of a repeated 
pattern of degenerate repeated triplet nucleotides. 

18. The library of dual domain polypeptide molecules of claim 16, or multi- 
domain polypeptide molecules of claim 17, wherein each linker in said library 

(i) has a length of between about one and 50 amino acid residues 

(ii) between 1 and about 20 different amino acids wherein each repeated 
pattern of degenerate triplet bases encodes between 1 and about 12 
different amino acids. 

19. The library of polypeptide molecules of claim 18, wherein said repeated 
pattern of degenerate repeated triplet nucleotides encoding said linkers having the 
following properties: 

(i) position 1 of each repeated triplet cannot be the same nucleotide as 
position 2 of the repeated triplet; or 

(ii) position 2 of each repeated triplet cannot be the same nucleotide as 
position 3 of the repeated triplet; or 

(iii) position 1 of each repeated triplet caimot be the same nucleotide as 
position 3 of the repeated triplet. 
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20. The library of polypeptide molecules of claim 19 wherein the nucleotide 
in the first and second positions of each repeated triplet is selected from any two of 
deoxyadenosine, deoxyguanosine, deoxycytidine or deoxythymidine. 

21 . The library of polypeptide molecules of claim 20, wherein 

(i) position 1 of each repeated triplet is deoxyadenosine or deoxyguanosine; 

(ii) position 2 of each repeated triplet is deoxycytidine or deoxyguanosine; and 

(iii) position 3 of each repeated triplet is deoxythymidine. 

22. The library of dual-domain polypeptide molecules of claim 16 or multi- 
domain polypeptide molecules of claim 17 produced in plant cells. 

23. The library of polypeptide molecules of claim 1 8 produced in plant cells. 

24. The library of polypeptide molecules of claim 19 produced in plant cells. 

25. The library of polypeptide molecules of claim 20 produced in plant cells. 

26. The library of polypeptide molecules of claim 21 produced in plant cells. 

27. A dual-domain polypeptide molecule selected from the library of claim 16. 

28. A multi-domain polypeptide molecule selected from the library of claim 17, 

29. A dual domain polypeptide molecule or multi-domain polypeptide 
molecule selected from the library of claim 18, 

30. A dual domain polypeptide molecule or multi-domain polypeptide 
molecule selected from the library of claim 19. 

31. A dual domain polypeptide molecule or multi-domain polypeptide 
molecule selected from the library of claim 20. 

32. A dual domain polypeptide molecule or multi-domain polypeptide 
molecule selected from the library of claim 21. 
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33. A three domain peptide selected from the hbrary of claim 17 
which is a dual domain scFv polypeptide linked to a third polypeptide domain. 

34. The three domain polypeptide of claim 33 wherein the third domain is a 
toxin polypeptide or an enzyme. 

35. A method of generating the library of dual-domain nucleic acids of 
claim 1, comprising: 

a. obtaining two template DNA sequences that comprises the first and the 
second domains; 

b. preparing amplification primer pairs which amplify the first and second 
domains where each primer pair comprises an upstream primer and a 
downstream primer, each primer having a 5' end and a 3' end, wherein 
the downstream primer for the first domain or the upstream primer for 
the second domain comprises a nontemplated sequence, 

said nontemplated sequence comprising a repeated pattern of 

degenerate repeated triplet nucleotides, 

wherein at least two of the 5' terminal triplets of said 
repeated pattern of degenerate repeated triplet nucleotides 
have the same degenerate sequence; 

c. amplifying the domains with the amplification primers to generate at 
least one population of nucleic acid domains having different lengths and 
sequences in the non-templated sequence; and 

d. ligating the nucleic acid domains generated in step (c) to generate said a 
population of dual-domain molecules. 
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36. The method of claim 35, wherein said repeated pattern of degenerate 
repeated triplet nucleotides in at least one of said primers has the following properties: 

(i) position 1 of each repeated triplet cannot be the same nucleotide as 
position 2 of the repeated triplet; or 

(ii) position 2 of each repeated triplet cannot be the same nucleotide as 
position 3 of the repeated triplet; or 

(iii) position 1 of each repeated triplet cannot be the same nucleotide as 
position 3 of the repeated triplet. 

37. The method of claim 35 wherein at least one of the primers contains a 
non-templated endonuclease recognition site. 

38. The method of claim 35 wherein said template DNA sequences are made 
by reverse transcription of mRNA. 

39. The method of claim 35 further comprising the step of ligating the 
population of dual-domain nucleic acids to vectors. 

40. The method of claim 39, further comprising the step of introducing said 
vector into a host. 

41 . The method of claim 40 wherein said nucleic acid domains encode 
polypeptide domains, and which method further comprises the step of expressing dual- 
domain polypeptides encoded by said dual-domain nucleic acids. 

42. The method of claim 39 wherein further comprising the step of 
transcribiOng RNA from said vectors 

43 . The method of claim 42 wherein said vectors are compatible with 
replication and/or expression of said nucleic acids in plant cells, said method further 
comprising the steps of introducing the transcribed said RNA into a plant cell and 
expressing the dual-domain polypeptide. 

44. A population of dual-domain polypeptides or a dual-domain polypeptide 
selected therefrom, produced by the method of claim 41. 
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45. A population of dual-domain polypeptides or a dual-domain polypeptide 
selected therefrom, produced in plant cells by the method of claim 43. 

46. A method of producing the polypeptide of claim 27 comprising the steps of: 

(a) joining a nucleic acid encoding the first domain of the polypeptide to a 
nucleic acid encoding a first part of a linker to produce a first nucleic 
acid construct; 

(b) joining the nucleic acid encoding a second part of the linker to a nucleic 
acid encoding the second domain of the polypeptide to produce a second 
nucleic acid construct; 

(c) incorporated said first and said second constructs into a transient plant 
expression vector in frame so that, when expressed, the polypeptide bears 
the first and second domain separated by the linker as described by the 
formula Di-L -D2. 

(d) transfecting a plant with the vector so that the plant transiently produces 
the polypeptide; and 

(e) recovering the polypeptide as a soluble, fimctionally-folded protein. 

47. The method of claim 46 wherein the plant is a plant cell. 

48. A linker nucleic acid molecule or sequence that joins two nucleic acid 
domains or two nucleic acid sequences encoding two polypeptide domains, which has a 
pattern of degenerate repeated triplet nucleotides with the following properties: 

(i) position 1 of each repeated triplet cannot be the same nucleotide as 
position 2 of the repeated triplet; or 

(ii) position 2 of each repeated triplet cannot be the same nucleotide as 
position 3 of the repeated triplet; or 

(iii) position 1 of each repeated triplet cannot be the same nucleotide as 
position 3 of the repeated triplet; and 

(iv) wherein said molecule or sequence that joins said domains does not 
encode Gly4Ser or a repeat thereof. 
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I 



49. A library of linker nucleic acid molecules or sequences each of which 
joins two nucleic acid domains or two nucleic acid sequences encoding two polypeptide 
domains, each of which has a pattern of degenerate repeated triplet nucleotides with the 
following properties: 

5 (i) position 1 of each repeated triplet cannot be the same nucleotide as 

position 2 of the repeated triplet; or 

(ii) position 2 of each repeated triplet cannot be the same nucleotide as 
position 3 of the repeated triplet; or 

(iii) position 1 of each repeated triplet cannot be the same nucleotide as 
10 position 3 of the repeated triplet; and 

(iv) wherein each of said molecules or sequences that joins said domains does 
not encode Gly4Ser or a repeat thereof. 
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50. A method for making the Hbrary of linker nucleic acid molecules or 
sequences of claim 49, comprising: 

(a) obtaining two template DNA sequences that comprise the first and the 
second domains; 

(b) preparing amplification primer pairs which amplify the first and second 
domains where each primer pair comprises an upstream primer and a 
downstream primer, each primer having a 5' end and a 3' end, wherein 
the downstream primer for the first domain or the upstream primer for 
the second domain comprises a nontemplated sequence, 

said nontemplated sequence comprising said repeated pattern of 
degenerate repeated triplet nucleotides, wherein at least two of the 
5' terminal triplets of said repeated pattern of degenerate repeated 
triplet nucleotides have the same degenerate sequence; 

(c) amplifying the domains with the amplification primers to generate at 
least one population of nucleic acid domains having different lengths and 
sequences in the non-templated sequence; and 

(d) ligating the nucleic acid domains generated in step (c) to generate said 
population of dual-domain molecules. 

(e) excising or amplifying said linker nucleic acid molecules or sequences 
from said population of dual domain molecules. 
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51. A method for making a linker nucleic acid molecule or sequence that 
joins two nucleic acid domains or two nucleic acid sequences encoding two polypeptide 
domains, which has a pattern of degenerate repeated triplet nucleotides with the 

following properties: 

(i) position 1 of each repeated triplet cannot be the same nucleotide 
as position 2 of the repeated triplet; or 

(ii) position 2 of each repeated triplet cannot be the same nucleotide 
as position 3 of the repeated triplet; or 

(iii) position 1 of each repeated triplet caimot be the same nucleotide 
as position 3 of the repeated triplet; and 

(iv) wherein said molecule or sequence that joins said domains does 
not encode Gly4Ser or a repeat thereof 

said method comprising the steps of 

(a) making the library of linker nucleic acid molecules or sequences in 
accordance with the method of claim 49 

(b) selecting and isolating said linker molecule or sequence from said library. 
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ABSTRACT OF THE DISCLOSURE 

Disclosed are methods and compositions for creating a DNA, RNA or protein 
molecule with two or more nucleic acid or polypeptide domains, respectively, joined by 
a linker region. These methods are used to generate random linker libraries of nucleic 
acids that encode dual-domain or multi-domain polypeptides. The linker regions are 
characterized by both length and sequence variability . 
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DECLARATION AND POWER OF ATTORNEY FOR PATENT APPLICATION 

English Language Declaration 

As a below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next to my name. 

I believe I am the original, first and sole inventor (if only one name is listed below) or an original, first and 
joint inventor (if plural names are listed below) of the subject matter which is claimed and for which a 
patent is sought on the invention entitled, CREATION OF VARIABLE LENGTH AND SEQUENCE 
LINKER REGIONS FOR DUAL-DOMAIN OR MULTI-DOMAIN MOLECULES the specification of 
which 

[3 is attached hereto. 

□ was filed on as Serial No. 

and was amended on (if applicable) 

□ was described and claimed in PCX International Apphcation No. , 

filed on and as amended under PCX Article 19 on 

O (if any), 

W I hereby state that I have reviewed and understand the contents of the above identified specification, 
^ including the claims, as amended by any amendment referred to above. 

W I acknowledge the duty to disclose information which is material to patentability as defined in Xitie 37, 
Code of Federal Regulations, § 1.56. 

L I hereby claim foreign priority benefits under Xitle 35, United States Code, §1 19(a)-(d) of any foreign 
^ application(s) for patent or inventor's certificate listed below and have also identified below any foreign 
Jt? application for patent or inventor's certificate having a filing date before that of the application on which 
m priority is claimed: 

U PROVISIONAL APPLICATION NUMBER FILING DATE 

60/155,978 September 24, 1999 

I hereby claim the benefit under Xitie 35, United States Code, §120 of any United States application(s) 
listed below and insofar as the subject matter of each of the claims of this application is not disclosed in the 
prior United States application in the manner provided by the first paragraph of Xitie 35, United States 
Code §1 12, 1 acknowledge the duty to disclose material to patentability as defined in Xitie 37, Code of 
Federal Regulations, §1.56 which became available between the fihng date of the prior application and the 
national or PCX international filing date of this application: 



(Application Serial No.) (Filing Date) (Status) 

(patented, pending, abandoned) 

I hereby declare that all statements made herein of my own knowledge are true and that all statements 
made on information and belief are believed to be true, and further that these statements were made witii 
the knowledge that willful false statements and the Hke so made are pxmishable by fine or imprisonment, or 
both, under Section 1001 of Xitle 18 of the United States Code and that such willful false statements may 
jeopardize the validity of the application or any patent issued thereon. 
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English Language Declaration 

POWER OF ATTORNEY: As a named inventor, I hereby appoint the following attomey(s) and/or 
agent(s) to prosecute this application and transact all business in the Patent and Trademark Office 
connected therewith. 

Ronald P. Kananen, Reg. No. 24,104; Shmuel Livnat, Reg. No. 33,949; Ralph T. Rader, Reg. No. 
28,772; Michael D. Fishman, Reg. No. 31,951, Richard D. Grauer, Reg. No. 22,388; Joseph V. 
Coppola, Sr., Reg. No. 33,373; Michael B. Stewart, Reg. No. 36,018; Christopher M. Tanner, Reg. 
No. 41,518; David K. Benson, Reg. No. 42,314; Robert S. Green, Reg. No. 41,800, 

Send Correspondence to: Direct telephone calls to: 

Shmuel Livnat Shmuel Livnat 

RADER, FISHMAN & GRAUER (202) 955-3750 

The Lion Building 

1233 20* Street, N.W., Suite 501 

Washington, D.C. 20036 



Full name of sole or &st inventor 
Stephen J. REINL 



Inventor's signature Date 



Residence: Sacramento, CA 

Citizenship U.S.A. 

Post Office Address 943 Piedmont Drive 

Sacramento, CA 95822 

Full name of second joint inventor 

John A. LINDBO 

Second Inventor's signature 

Date 



Residence: Vacaville, CA 

Citizenship U.S.A. 

Post Office Address 143 Sundance Drive 

Vacaville, CA 95688 
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Thomas TURPEN 
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Date 
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Citizenship U.S.A. 


Post Office Address 160 Santa Fe Coiirt 
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Date 
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